
_ An 

Introduction 


_To 

Statistical 

Methods 


C. B. GUPTA. Ph D. 
Viet Principal 4 
Head of the Department of Commune 
Shri Ran College of Commerce, 

Vuiling Lecturer 
Deptt. of Business Management 
A Industrial Administration 
University o! Delhi. 

311 QUP 

nun 

4I1N 



RAM PRASAD AND SONS : AGRA-3 







© author 


Firet Edition 

19S7 

Second Edition 

mi 

Third Edition 

1984 

forth Edition 

me 

Reprint 

mi 

Reprint Edition 

1968 


NEW ERA OFFSET PRINTERS, ; DELHI-6 



Acknowledgments 


In writing this book 1 am indebted to many* I have con* 
suited works of many authorities on the subject whose names 
l have mentioned either in the text itself or in the list of 
selected references given in the Appendix. If, however, I 
have inadvertently made any adaptation where I should have 
obtained permission, I hope it will be excused as an oversight. 

Several of my friends assisted me in this work at various 
stages. I gratefully acknowledge my indebtedness to them 
al| and in particular to Shri Shanti Swarpop Gupta and SJiri 
Qm Prakash Vaish my old students at Delhi, and Shri Vidya 
Ratna, Sbrt B. D. Agrawala, Shri J. K. Gautam and Shri 
R. L. Gupta—my colleaguea in the Commerce Department. 

Principal Jai Narayan Vaish has always been a source of 
inspiration to me and 1 am grateful to him for valuable advice 
and encouragement. 



Tbt flfurtt in p*rrnth**{« of th t end of »n 
fKtrcite indicate the cofretpondin# number 
of thi* problem in author » other book 
entitled 'Stntitiical Calculation*' 



Preface to the Fourth Edition 


The increasing demand for this book provided me an 
opportunity of checking U up again within almost a year of its 
third edition. Many parts of the book have been completely 
re-written. The chapter on ‘Sampling and Statistical Inference* 
has been considerably expanded so as to include 'Small 
Samples** ‘Z-ten’, f The Variance Ratio Test* and 'Chi-Square 
Test’. Two new chapters—one on ‘Vita! Statistics’ and the 
other on ’Statistical (Quality Control* have been added* and the 
information on Indian Statistics has been brought upfo date. 

While offering this edition I ex pi ess my thanks to friends 
who favoured me with their comments and suggestions for its 
improvement, and in particular toSht i SR Zarapkar, Stddlarth 
College, Bombay. 

I wish to thank Shri S. P. Gupta and Sliri R. N. Goci- my 
colleagues in the Commerce Department who assisted me in 
the preparation of the revised edition. 


Author 



Prefact to the First Edition 


An important development that has taken place in the field 
of higher education in India during the last two decades is 
the inclusion of statistics in the curricula of most of our 
universities. Statistics is now being taught not only to students 
preparing for a degree in Mathematics or Mathematical 
Statistics, but also to students of Business Administration, 
Accountancy, Natural Sciences, Technology, Medicine, Edu¬ 
cation, etc. This is as it should have been. 

Statistics is based largely on mathematics. If one is to 
have a correct idea of the various statistical concepts, it is 
necessary that he should have if not a thorough grounding 
in mathematics at least an aptitude for it. A large number 
of our students who offer Commerce or Economies far their 
graduation have no liking for mathematical work and to them 
statistics is not quite interesting. 

The aim of this book is to create a liking for statistics in 
such students or at least to dispel their fear of the subject. 
An endeavour has been made to explain statistical concepts 
in non-technica! language. The statistical formulae, wherever 
possible, have been described in the language of words rather 
than of figures The plan of the book is bated on the class- 
room practice of discussing a new topic only when a proper 
background has been prepared for it. 

I hope the book will meet the needs of the students whom 
if ts intended to serve. 


Author 


Table of Contents 


Chapter 1 Paces 

Introduction M« 

Scientific and Statistical Methods ... i 

The Meaning of Statistics ... 2 

Definitions of Statistics ... 3 

Statistical Analysts and Inference ... 8 

Origin of Statistics ... 8 

Chapter 2 

Functions, Importance and Limitations n-at 

Functions of Statistics ... II 

Importance of Statistics ... 13 

Statistics and Business ... 14 

Statistic and Economics ... M> 

Limitations of Statistics ... 17 

Distrust of Slat istics ... 18 

Charter 3 

Statistical Inquiries m3 

Chaprtr 4 

Collection of Data—I 94*3$ 

Preliminary Consideration ... 24 

Statement of Purpose ... 24 

The Plan of Procedure ... 25 

The Scope of Inquiry ... 25 

Determination of the Unit: Counting and Measurement 2fi 
Importance of Statistical IInits ... 27 

Classification of Units ... 28 








CONTENTS 


Pages 


Oiigination oi‘ Units — 28 

Classification nt Si^usttcaJ Units as Regards Their 

Origin ■ 28 

Clarification According to Functions of Unit# 31 

Frchmque of Data Collection .. 33 

The Drgicc <4 Amua<v . > 33 

Approximation *. 34 

Chap 11 a 5 

Collection «if Data il 38-45 

Primal} Methods . 36 

SeromUo Methods 42 

Lditmg Primary Data 43 

Editing Snomlarv Data 44 

Ciiapvhi 6 

Sampling and (hr lonrrpt of Error 46-55 

Law of Staiiitical Kegulatm 46 

Law ofTnertu of Large Numbers 47 

Census; Versus Sample Lmitnrration ... 47 

Sampling Methods 49 

SvitertutK Sampling, m Random Sampling 50 

Stratified Sampling . 50 

Multi-vSuge Sampling ... 51 

Statistical Error . . 51 

Lxercises .. 54 

Cumin 7 

U«M*itic«tioti ami Tabulation 56-81 

Classification 55 

Statistical Senes ... 59 

Continuous ami Discrne Scries 59 

Time, Spatial 01 Condition Series 60 

I he Array >k . $2 

The Selection «>f Classes ... 67 

Method* of Designating Class Interval ... 68 

Frequency Distribution ■ (Continuous V'ariahlr 69 


Mid-Point of a Claw Interval and the Determination 
of Real Limit 


70 




CONTENTS 


IX 


Paghs 

Continuous Series ... 71 

Tabulation ... 72 

A form of Table ... 75 

General Purpose and Special Purpose Tables ... 75 

Origin and Derivative Tables ... 76 

Simple and Complex Tables ... 76 

Exercises 79 

Chapter 8 

Diagrammatic Representation 8 a-UJ 

Rules of Drawing Diagrams ... 83 

Charting : Categorical Series ... 84 

Bar Diagrams ... 85 

Simple Bar ... 87 

Sub-Divided Bar ... 87 

Percentage Bar ... 91 

Bilateral Bar ... 93 

Split Bar ... 94 

Rectangles ... 96 

Squares ... 99 

Circles ... 102 

Maps and Pictures ... 104 

Exercises ... 108 

Chard w 9 

Graphic Presentation 114-148 

Rectangular Co-ordinates ... 114 

Charting Time Series ... 115 

False Base Line ... 117 

Comparison of Time Series .119 

Component-Part Line Chart ... 123 

Semi-Logarithmic or Ratio Scale ... 123 

Shape of the Curve on the Two Scales ... 125 

Construction of a Ratio Scale ... 126 

Cycles of Ratio Scale ... 127 

Charting Frequency Series ... 130 

Charting Frequency Distribution of Discrete Type 130 


Charting Frequency Distribution of Continuous Type 131 











X 


CONTENTS 


Paces 

Histogram 132 

Frequency Polygon 134 

Smooth Frequency Curve I 36 

Ogive Curve or Cumulative Frequency Curve ... 138 

Charting Cumulative Frequency Distribution ... MU 

Cumulative Percentage Curve ... 142 

Exercises ... 147 

Chapter 10 

Ncsrait of Central Tcndftiry 149^14 

Types of Averages ... 150 

Arithmetic Mean - 151 

Computing Arithmetic Mean ; Individual Observation 151 
Computing Arithmetic Mean : Discrete Frequency 

Distribution .... 151 

Arithmetic Mean from Continuous Frequency 
Distribution : Long Method ... 157 

Arithmetic Mean from Grouped Data : Short Method 159 
Properties of the Arithmetic Mean ... 162 

Nature and Significance of Arithmetic Meat ... 163 

The Median ... 164 

The Median from Ungioupcd Data 164 

The Median from Grouped Data ... 166 

Location of Median by Graphic Analysis ... 109 

Properties of the Median ... 171 

Nature and Significance of Median ... 172 

Quartiles, Deciles and Percentiles J 73 

Location of Q,u*rtiles, Deciles, etc. ... 174 

Locating Graphically the Quartilcs, etc. ... 177 

The Mode 177 

Locating the Most Frequently Repeated Value in the 
Array ... 181 

Estimating the Mode by interpolation ... 182 

locating the Mo<ie by Graphic Method 185 

Estimating the Mode from the Mean and the Median 186 
Proper!ics of Mode ... 1 $/ 

The Geometric Mean ... iji<;: 







CONTENTS xi 

Pagvn 

Characteristics of Geometric Mean ... ]<X) 

Uses of GeometricMean igi 

The Harmonic Mean 192 

Selecting the Average ... 193 

Weighted Mean ... 199 

Exercises 20€i 

Chapter 1 i 

Measures of Dispersion 215*251 

Dispersion 217 

Definition 217 

Range . 218 

Scmi-Inlcr-Qiiaitile Range 222 

Mean Deviation . Tib 

Standard Deviation ... 232 

Loren/ Curve ... 245 

Exercises ... 248 

Chapter 12 

Skewness, Moments and Kortwiw 259*384 

Type of Distribution .. 252 

Measures of Skewness ... 257 

Skewness Measured by Relationship Between 3 M’s of 
Gent ra 1 Ten dent y 257 

Quartile Measure of Skevvnexs 260 

Moments . . 263 

Relation Between u and y 269 

Measures of Skewness Based on Moments 270 

Kurtosis .. • 271 

Sheppard’s Correction for Grouping Errors ... 273 

Exercises 280 

Chapter 13 

PrskalilUy 983*506 

Equally Likely Events ... 2ftb 

Permutations and Combinations ... 2HB 

Permutations ... 288 

Number of Different Permutations on rt 'Pilings 
Taken r at a Time 


289 






xii 


contents 


Pages 


Number of Permutation! of n Things, of which n x 
Are Ait Alike and of One Kind, n t All Alike of 
Another Kind and bo on ... 290 

Combinat ions 292 

Number of Combinations of n Dissimilar Things 
Taken r at a Time ... 292 

Relationship between Combinations and Permutations 291 
Problems on Probability Using the Concept of 
Combinations and Permutations 294 

Simple and Compound Events ... 295 

Addition Theorem 297 

Mutually Exclusive 298 

Multiplication Theorem *. 299 

Chapter 14 

Binomial, Normal ami Poiaton DfotHbutiou* 307-549 

Binomial Distribution ... 307 

Explanation 308 

General Form of Binomial Distr itution 311 

Mean and Standard Deviation of Binomial Distribution 313 
Normal Distribution 316 

Discovery of the Normal Curve .., 317 

Factors Which Lead u> the Emergence of a 

Normal Curve ... 3ltt 

Mathematical Equation ... 319 

Standardized Normal form 320 

Properties of the Normal Curve . .. 321 

Fitting a Normal Curve to an Observed Distribution 323 
Testing the Normality of a Certain Distribution .... 331 

Advantages of the Normal Curve ... 334 

Poisson Distribution ... 335 

Utility of Poisson Distribution 336 

Prosjttct of Poisson Distribution ... 337 

Fitting a Potasion Distribution ... 337 

Problems Involving the Use of Poisson Distribution 340 
Calculation of A. Mean and Standard Deviation of 
Poisson Distribution 343 



CONTENTS xiii 

Pages 

Proof for Portion Distribution ai a Limiting Cave of 
Binomial Distribution ... 344 

Exercises ... 347 

Chapter 15 

Sampling ami Statist leal inference 350*4*7 

Infinite and Finite Populations 350 

Sampling 350 

Objects of Sampling 351 

Sampling Errors and Differences Between Population 
and Sample Measures ... 351 

Null Hypothesis and Level of Significance ... 353 

Standard Error and Sampling Distribution ... 354 

Theorems ... 355 

Standard Error of the Mean ... 356 

Interpretation of Standard Error ... 356 

Statistical Interference 357 

Small Samples ... 372 

SvmboPT* and V ... 373 

V Distributions ... 375 

Uses of the ‘t’ Distribution ... 375 

Significance of a Sample ... 376 

The Difference” Test ... 373 

Test of Difference between the Means of Two 

Samples ... 331 

Mtst of the Significance of an Observed Correlation 
Coefficient ... 383 

Z-test for Testing Significance of V ... 385 

The Variance*Ratio Test or the F-teit ... 388 

Chi-Square Vest and Goodness of Fit 391 

Degrees of Freedom 393 

Conditions for the Application of /* Test ... 394 

Yate*s Correction 394 

Uses of y* Test . 395 

Exercises ... 402 



CONTENTS 


XIV 

Pages 

Chaptex 16 

Tim Anmlyd* of Time Setk* |®B-453 

Secular Trend *09 

Periodic Changes — * l 1 

Irregular or Random Fluctuations .. *14 

General Statement of the Nature of Time Series ... 415 

Measurement of Trend 416 

Centering a Moving Average 420 

The Method of Least Squares ... 422 

The Principle of Least Squares ... 423 

Normal Equations ... 424 

Computing the Trend 425 

Short Method of Arithmetic Straight Line ... 427 

Computation of Trend by Short Method—Odd 
Number of Years 42ft 

Explanations of Constants a and /> . 430 

Computation of Trend by Short Method—Even 

Number of Years ... 431 

Semi-Logarithmic or Geometric Straight Line Trend 433 
Logarithmic Straight Line ... 434 

Non-Linear Trends—Second Degree Curve ... 435 

Computation of Trend by Second Degree Parabola 437 

Third Degree Curve . 439 

Elimination of Trend ... 440 

Method of Measuring Seasonal Fluctuations 441 

Cyclical Fluctuations . . 450 

Exercises ... 451 

Chamt* 17 

Indie x Number# 454~5°9 

The Construction of Index Number* ... 455 

Simple Aggregate of Actual Prices ... 456 

The Simple Average of Relatives ... 458 

Comparison of Index Numbers ... 465 

The Time Reversal Test ... 465 

Weighting of Index Numbers 469 

The Weighted Aggregate of Price Index ... 473 




CONTENTS 


XV 

Pages 

The Weighted Mean of Relatives Price Index .... 475 

Weighted Geometric Mean of Relative* ... 477 

Bias in Weighted Index Numbers 478 

Factor Reversal Test . • 478 

The Ideal Index 479 

Problems on Index Number Construction 481 

Fixed and Chain Base Indices ... 484 

The Method of Computation ... 485 

Merits ami Demerits of the Chain Base Method . 488 

The Choice of an Average ... 488 

Base Shifting, Splicing and Deflating ... 488 

Exercises ... 493 

Chapter 18 

Correlation and Regret too 501-550 

Posit ive and Negative Correlation ... 501 

Causation and Correlation 503 

Methods of Studying Correlation ... 501 

The Scatter Diagram ... 504 

Karl Pearson’s Coefficient of Correlation . . 508 

Coefficient of Correlation in Continuous Series . 513 

Coefficient of Correlation for Historical Series 516 

Applied to Short Term Changes ... 518 

Method of Concurrent Deviation ... 520 

Graphic Method of Correlation ... 522 

Lag ... 522 

The Coefficient of Rank Correlation ... 526 

Advantages of This Me thod ... 528 

Regression Line ... 529 

The Standard Error of Estimate 532 

Interpretation of the Standard Error of Estimate ... 534 

Explained and Unexplained Variability ... 535 

Regression of X on T ... 536 

Regrets ion Coefficient and the Coefficient of Correlaion 537 
Relationship Between h*,, r, o t , a M .. 4 537 

The Point of Intersection of the Two Regression ... 540 

Exercises ... 544 




XV* CONTENTS 



Pages 

CHAmm 19 


Association of Attribute* 

53*-37* 

Dichotomy and Notation 

551 

Positive and Negative Attribute* 

552 

Order of Classes 

553 

Consistency of Data 

557 

Association and Dissociation 

559 

Comparison of Expected and Observed 

Frequencies 559 

Positive and Negative Association 

562 

Method of Proportion 

562 

Coefficient of Association 

MA 

Exercises 

567 

CHAPTER 20 


latorpolatioo 

57»*5W 

Method* of Interpolation 

572 

Graphic Method 

572 

Alge braic Method 

573 

Binomial Expansion Method 

580 

Da grange's Method 

583 

Exercises 

586 

CHAmm 21 


Vital Statistics 


Registration of Vital Facts 

589 

Methods of Analysis of Vital Events 

590 

Measures of Mortality 

590 

Standard Death Rate 

593 

Measures of Fertility 

596 

Gross Reproduction Rate 

59H 

Nett Reproduction Rate 

600 

Life Tables 

601 

The Construct ion of Life Tabu 

601 

Exercises 

605 

CHamm 22 


Statistical Quality font ml 

fisyfai 

Process Control 

608 

Control Charts 

610 






(xnrrms 


wii 

?Aom 

Calculation of Control Limits 613 

Euntplea of Different Charts ... 614 

Product Control ... 616 

Advantages of Statistical Quality Control ... 620 

Exercises ... 621 

Chaptw 23 
Indian Statistics 

Historical Background ... 622 

Nature and Structure of the Indian Statistical 
Organisation 625 

Statistical Organisation at the Centre ... 627 

Statistical Organisation in the States ... 627 

Cabinet Secretariat ... 626 

Ministry of Commerce and Industry ... 631 

Ministry of Finance ... 654 

Ministry of Food and Agriculture ... 655 

Ministry of Home Affairs ... 636 

Ministry of Labour and Employment ... 6S7 

Ministry of Steel Mines and Fuel ... 637 

Population Statistics ... 639 

The Problem of Quality in Census Data ... 642 

Estimating lmercensal Population ... 644 

Population Statistics in India ... 646 

Information Collected by the 1951 Census ... 648 

Main Features of the Census of 1961 ... 654 

The Household Schedule ... 655 

Findings of 1961 Census ... 659 

Labour Statistics ... 661 

Employment ... 662 

Statistics of Unemployment ... 664 

Statistics of Absenteeism ... 664 

Statistics of Labour Turnover ... 665 

Trade Union Statistics ... 665 

Statistics of I ndustr ia) Injuries ... 666 

Statistics of Industrial Dispute* ... 667 

Statistics of Wage* ... 667 




Xvni 


CONTENTS 


Pages 

labour Bureau Index of Earnings of Factory 
Workers 668 

Agricultural Statistics ... 669 

Agricultural Statistics in India ... 673 

Statistics of Land Utilization ... 673 

Crop Acreage Statistics ... 675 

Chief Causes oflnaccuracics 675 

Livestock Statistics ... 685 

Forest Statistics ... 687 

Industrial Statistics ... 689 

Census of Manufacturing Industries ... 692 

Index Number of Industrial Production ... 695 

The New Revised Index of Industrial Pro* 
duction ... 699 

Trade Statistics 711 

Foreign Trade Statistics ... 711 

India's Foreign Trade Statistics ... 712 

Internal Trade ... 715 

The National Sample Survey ... 717 

National Income ... 724 

I nd ia Nat ional I ncome ... 728 

Difficulties of Estimation in India ... 728 

Official.Estimates of National Income by the 
N. I. U. ... 734 

Method of Estimation ... 740 

Social Accounting ... 741 

Use of SociaI Accounts ... 750 

Index Numbers ... 751 

Index Number of Agricultural Production ... 751 

Ministry of Food and Agriculture Series . .. 752 

Groups ... 752 

The Reserve Bank of India Index of Agricultural 
Production ... 755 

The Eastern Economics Index ... 756 

The FAX). Index ... 757 

Index Numbers of Commodity Prices ... 757 






CONTENTS XIX 

Pages 

Official Index Number of Wholesale Prices 
(Revised Series) ... 759 

Overage of the Rev ised Series ... 759 

Formula ... 770 

Problem of Continuity ... 770 

Consumer Price Index Numbers ... 771 

Precautions in the Use of a Consumer Price 
Index ... 772 

Problems in the Construction of Consumer 
Price Living Index Number ... 772 

Method of Compilation of Consumer Price 
Index ... 776 

Consumer Price Index Numbers in India ... 777 

New Series of Consumer Price Index oflndustrial 
Workers ... 782 

Limitations of Consumer Price Index Numbers ... 782 

Appendices i 

Appendix I : Logarithms iii 

Appendix II : Selected References xv 

Appendix III : Index xvii 




Chapter i 
Introduction 


Scientific and Statistical Method* 

K nowledge is power. It is natural, therefore, that man 
should have been in quest of knowledge ever since he 
came into existence. In the earlier stages of human development 
this desire was utilitarian in character, i.e., nvan tried to know 
and discover only those things which satisfied his needs. But 
later on knowledge was sought for its own sake. Today it is no 
longer a necessity which compels him to discover the truth. He 
wants to discover it for the mere pleasure of doing so. 

Knowledge is obtained in many ways. In the beginning 
'Perception and 'Intuition' were the only methods known. In 
some cases truth was supposed to be revealed in dreams also. 
But the knowledge so obtained was poor both quantitatively 
and qualitatively. The bulk of human knowledge and parti¬ 
cularly of scientific know ledge is the result of a plan and not the 
product of chance. Truth is discovered by going through the 
discovering process methodically and in a planned manner. 
These methods are called Scientific Method*. 

Scientific Methods are of two kind*. Technical and Logical. 
Technical Methods are the methods of measuring the pheno¬ 
menon under investigation and manipulating the conditions 
under which it can be observed with advantage. These methods 
are different in different sciences. Logical Methods, on the 
other hand, are the methods of reasoning according to the 
nature of the available data. These two methods are comple¬ 
mentary and intimately connected with one another. As a 
matter of fact, technical methods arc mainly auxiliary to the 
logical methods, and by themselves are not enough in any 
scheme of investigation. They only render possible the obser* 





2 AH IWTHODUCTION TO STATISTICAL METHODS 


vation and measurement of phenomena which could not have 
been otherwise observed or measured accurately, and provide 
the material on which we can base our arguments. 

Whereas technical methods are* for most part, different 
from one science to another, logical methods are common to 
practically all the sciences and these are the only ones which 
can be studied by all men of science (strictly so called) as well 
At those who are not men of science. 

Logical methods that commonly go by the name of scientific 
methods are mainly the methods of inductive reasoning. 
‘Inductive reasoning mostly aims at the discovery of some 
general truth from which certain observed facts might have 
been inferred. 1 There are various ways in which generalisation 
may be made. Which method of inductive reasoning is to be 
applied depends upon the circumstances of each individual 
case. But the methods of inductive reasoning can he fruitfully 
applied only where the facts under investigation can hr 
adequately analysed and are capable of being examined under 
varied conditions. These conditions may not always be satisfied. 
The phenomenon under investigation may be too complicated. 
It may not be possible to adequately analyse it or observe it 
under sufficiently varied conditions. In such cases the methods 
of inductive logic wilt not be of much assistance and recourse 
may have to be had to what are known as Statistical Methods 
or Statistics, 

The meaning of Statistics 

The word Statistics is used in two senses. It is sometimes 
used in the plural and sometimes in the singular. When used 
in the plural it refers to numerical statements of facts. Thus 
when we say that there are statistics in the Reserve Bank of 
India Bulletin* what wc really mean is that the Bulletin contains 
certain facts which arc stated numerically* The difference 
between a statement in ‘numerical form* and another one which 
is not expressed numerically may be readily understood by an 
illustration. The height of individuals may be described by 
expressions like 180 cm., 155 cm., etc., or by expressions like 



I NTOODUCTlOft 


3 


‘taiP, ‘short\ etc. The descriptions of heights given in the former 
cases ore numerical. Those io the latter are not to* It may, 
however, be stated at the very outset that whereas ‘statistics* are 
‘numerical statements of facts, all facts numerically stated are 
not statistics*. In order that numerical data may be called 
statistics they must possess certain characteristics. These 
characteristics have been explained in detail in this chapter at 
another place* 

When this term (Statistics) h used in singular it might 
mean either 'The Methodology of Statistics' or “The Theory of 
Statistics’. Methodology of Statistics (or Statistical Methods) 
refers to all methods and techniques which arc applied in 
collecting and converting a mass of unwieldy figures into easily 
understandable statements of facts. It includes all those 
methods that arc used in collec ting* presenting, analysing and 
interpreting quantitative data. Viewed in this sense statistical 
methods are synonymous with manufacturing processes. Just 
as many operations are to be carried out in converting raw 
cotton into cloth, similarly many statistical methods are appli* 
ed before any conclusions can be drawn from quantitative 
data. The methods commonly used arc ; sampling, classi¬ 
fication, tabulation, diagrammatic and graphic representation 
condensation, measures of variability, index numbers and 
correlation coefficients. 

The theory of Statistics is the body of principles that has 
been developed to serve as a guide for sound statistics i ad 
sound statistical methods, and which underlies and justifies 
the formulas used in the application of statistical methods to 
various fields of inquiry. Thus it can be said that by theory 
of statistics we mean the exposition of statistical methods. 

Definitions of Statistics 

Statistics has been defined variously by different writers. 
The reasons for the difference are mainly two. First* the 
scope of statistics now is considerably wider than what it was 
sometime back Formerly statistics was conceived as a state¬ 
craft only. In modern times, it pervades all departments of 



4 AN INTRODUCTION TO STATISTICAL METHODS 

inquiry, Naturally, therefore, the earlier definition* are 
narrow and are not as comprehensive as those given by modern 
writers. Secondly, some writers have defined it as ‘Statistical 
Data* (Statistics used in plural), whereas others have defined 
it as ‘Statistical Methods*. As a matter of fan statistics 
should be defined in both senses, i. e., as ‘statistical data*, and 
also as ‘statistical methods*. Statistics (plural) or statistical 
data have been defined by Webster as ‘Classified facts respect* 
ing the condition of the people in a state , . . .especially those 
facts which can be stated in numbers or in tables of numbers 
or in any other tabular or classified arrangement/ No doubt 
this definition was correct at a time when statistics were 
collected only by the Government and only for purposes of 
internal administration or for knowing for purposes of war the 
wealth of the state, human as well as non-human. The scope of 
statistics is now considerably wider and it has almost a universal 
application. Obviously, therefore, this definition is inadequate. 

Bow ley defines statistics as ‘Numerical statements of facts in 
any department of inquiry placed in relation to each other/ 
This is somewhat more accurate. It means that if numerical 
facts do not pertain to a department of inquiry or if such facts 
are not related to each other they cannot be called statistics. 
This leads us to the conclusion that ‘all statistics are numerical 
facts but all numerical farts are not statistics*. This definition 
is certainly more comprehensive than the previous one. But 
a much better definition is provided by Scrmt, who defines 
statistics as ‘Aggregates of facts, affected to a marked extent 
by a multiplicity of causes, numerically expressed, enumerated 
or estimated according to reasonable standards of accuracy, 
collected in a systematic manner for a predetermined purpose, 
and placed in relation to each other/ This definition specifies 
certain charactnisrics which numairal data must possess if 
they are to be called statistics. Let m pause a while o» exa¬ 
mine their eharacteristics. 

Statistic* are aggregates of fact* 

This means that statistics are a ‘number of facts*. A single 




INTRODUCTION 5 

fact, even though numerically stated, cannot be called statis¬ 
tics, *A single death, an ancient, a scale* a shipment docs not 
constitute statistics. Vet numbers of deaths* accidents, sales and 
shipments are statistics/ Observe carefully the following two 
charts containing information about the population of India* 
Chart 1 states the population only for oue year whereas chart 
II gives population figures fur six different periods. The data 
given in chart 11 are statistics whereat those given in chart I 
are not so for the simple reason that they arc ‘single* solitary 
facts. 



CHART 1 


CHART If 

Year 

Population 

Year 

Population 


(in lakhs) 


(in lakh*) 

1931 

3,369 

i 1901 . 

. 2,355 


1 1911 

2,490 



19*21 

2.481 



1931 

2,755 



1941 

3 , 128 * 



1951 

3,569 


Statiftticft are affected to a marked emtent by a 
multiplicity of cause* 

This means that statistics are aggregates of such facts only 
as grow out of a ‘variety of circumstances*—when their size, 
shape or form at any particular moment is the result of the 
action and interaction of a number of forces, differing amongst 
themselves and it is not possible to say as to how much of it is 
due to any one particular cause. Thus the volume of wheat 
production is attributable to a number of factors, viz., rainfall, 
soil, fertility, quality of seed and methods of cultivation, etc. 
AM these factors acting jointly determine the amount of the 
yield and it is not possible for anyone to assets the individual 
contribution of any one of these factors. 

Alter deducting ethmaKd a mown of iidJation vf returns In v Vst Bengal 
ami Punjab (20 lakhs). 






# AJV 1 PtTHODUCTIOW TO STATISTICAL METHODS 

Italiftftet mute be mttini«4 or ostimjrted according 
li »t«tti«r(b of temtey 

This means that if aggregates of numerical facts art to be 
called ‘statistics' they must be reasonably accurate. This is 
necessary because statistical data are to serve as a basis for 
statistical investigations. If the basis happens to be incorrect 
the results are bound to be misleading. It must, however, be 
clearly stated that it is not ‘mathematical accuracy, but only 
reasonable accuracy * that is necessary in statistical work, What 
standard of accuracy is to be regarded as reasonable will 
depend upon the aims and objects of inquiry. Where precision 
is required accuracy is necessary ; where general impressions 
are sufficient, appreciable errors may be tolerated. Again, 
whatever standard of accuracy is once adopted it should be 
uniformly maintained throughout the inquiry. 

bidstbi are collected in a systematic manner for a 
predetermined purpose 

Numerical data can be called statistics only if they have 
been compiled in a properly planned manner and for a purpose 
about which the enumerator had a definite idea. So long as 
the compiler is not dear about the object for which facts arc 
to be collected he will not be able to distinguish between facts 
that are relevant and those that are unnecessary ; and as such 
the data collected will in all probability be a heterogeneous 
mass of unconnected facts. ’Again, the procedure of data col¬ 
lection must be properly planned, be., it must be decided 
beforehand as to what kind of information is to be collected 
and the metliod that is to be applied in obtaining it. This 
involves decisions on matters like ‘statistical unit', 'standard of 
accuracy', ‘list of questions', etc. Facts collected in an unsyste¬ 
matic manner and without a complete awareness of the object 
will be confusing and cannot be made the basis of valid 
conclusions. 

Statistics sksvld bo placed ia relation to cacti other 

Numerical facts may be placed in rdatbn to each other 



JNTKO DICTION 


7 


either in point of time, space or condition. The phrase 
‘placed in relation to each other* suggests that the facta should 
be comparable. Facts are comparable in point of time when 
we have measurements of the same object, obtained in an 
identical manner, for different periods. They are said to be 
related in point of space or condition when we have the 
measurements of the same phenomenon at different plates or 
in different conditions, but at the same time numerical facts 
will Ire comparable if they pertain to the same inquiry and 
have been compiled in a systematic manner for a predeter¬ 
mined purpose. 

Statistics (singular) or Statistical Methods have been 
defined by Bowley as ‘the science of measurement of the social 
organism, regarded as a whole., in all its manifestations.* This 
definition it too narrow inasmuch as it confines the scope of 
Statistics only to human activities. Statistics in fact has a 
much wider application and is not confined only to the social 
organism. Besides, statistics is not only the technique of 
measuring but of analysing and interpreting also. Again, 
statistics, strictly speaking, is not a science but a scientific 
method. It is a device of inferring knowledge and not know¬ 
ledge itself. 

Bowley has also called statistics * the science of counting', 
and 'the science of averages'. These definitions are again 
defective in the sense that they pertain to only a limited field. 
True, statistical work includes counting and averaging, but at 
also includes many other processes of treating quantitative 
data. In fact, while dealing with large numbers, actual count 
becomes illusory and only estimates are made. Thus these 
definitions can also lie discarded on the ground of inadequacy. 

King defines statistics as ‘the method of judging collective 
natural or social phenomena from the results obtained by the 
analysts of an enumeration or collection of estimates/ This 
definition gives equal importance to the analytical and inferen¬ 
tial aspects of statistics and as such it may be adopted by us 
as the best description of the subject. 



8 AN INTRODUCTION TO STATISTICAL-METHODS 

ttfttiilkal Aiudyilf and Inference 

Statistical methods consist of two parts: (a) statistical 
analysts, and (b) statistical inference. The former generally 
describes the group characteristics of the particular data 
observed, and the latter describes the judgment* based on 
statistical analysis. Of the two, statistical inference 4 is much 
more fascinating, has greater apj>eal to the imagination, calls 
for a keener sense of logic and makes a greater contribution 
towards and understanding of the statistical nature of the 
universe/ The real significance of these two aspects of statis* 
tical methods can be easily understood by the following 
example. If it is desired to study the problem of infant 
mortality in a particular area, the first thing that is to be done 
ts to analyse the various causes of death and to study the 
impact of each one of these causes on the various classes of 
people, via., upper class, upper middle class, lower middle 
class and poor class. This kind of analysis will give us an 
insight into the problem, and we will be able to know from 
such an analysis that the rate of infant mortality among the 
poor is much higher than among the rich and that more deaths 
are caused by malaria than by any other disease. Such an 
analysis leads m to the conclusion that the poorer sections of 
our community arc not able to offer adequate resistance to 
malaria and that they live in unhealthy surroundings. The 
former is the process of analysis whereas the latter is that of 
statistical inference. 

Origin of Statistics 

Statistics originated from two quite dissimilar fields, viz., 
games of chance and political states. T hese two different 
held* are also termed as two distinct disciplines—one primarily 
analytic and the other essentially descriptive. The former is 
associated with the concept of chance and probability and the 
latter concerned w ith collecting of data. 

The theoretical development of the subject has its origin 
in the mid-seventeenth century and many mathematicians and 
gamblers of France, Germany and England are credited 



INTRODUCTION 


y 

far its development. Notable amongst them art Pascal 
(1623-1662) who investigated the properties of the coefficients 
of binomial expansion. James Bernouilli (1634-1705) who wrote 
the first treatise on the theory of probability, De Mtdvie (1667- 
1754) and Gauss (1777-1855), 

As regards the descriptive side of statistic* it may be stated 
that statistics is as old as statecraft Since time immemorial 
men must have been compiling information about wealth and 
man-power for purposes of peace and war. This activity 
considerably expanded at each upsurge of social and political 
development and received added impetus in periods of war, 

The development of statistics can be divided into the 
following three stages : 

The Empirical Stage down to 1000/ 

During this, the primitive stage of the subject, numerical 
facts were utilized by the rulers, principally as an aid in the 
administration of Government. Information was gathered 
about the number of people and the amount of property held 
by them—the former serving the ruler as an index of human 
lighting strength and the latter a* an indication of actual and 
potential taxes. 

The Comparative Stage : 1600-1800) 

During tins period statisticians fre<|Liently made comparisons 
between nations with a view to judge their relative strength 
and prosperity. In suine countries mtiuuies were instituted 
to judge the economic and social conditions of their people. 
Colbert introduced in France a ‘mercantile' theory of Govern¬ 
ment whose basis was essentially statistical in character. In 
1719 Frederick William I began gathering information about 
population, occupation, house-taxes, city finances, etc., which 
helped to study the social condition of the people. 

The Modern Stage (1800 up to dale) 

During this period statistics is viewed as a way of handling 



10 AN INTRODUCTION TO STATISTICAL METHODS 

numerical facts rather than a mere device of collection of 
mimerkai data. Besides, there hat been a considerable eaten* 
sion of the field of its applicability. It has now become a 
useful tool in the world of affairs and statistical methods of 
analysis are now being increasingly used in Biology, Psychology, 
Education, Economics and Business. 



Chapter 2 

Functions, Importance and Limitations 


*Thr fundamental gojpcl of statistics u to push back (he domain of 
ignorance, prejudice, rule of thumb, arbitrary or premature decisions, 
traditions and dogmatism, and to increase the domain in which decisions 
are made and principles are formulated on the basis of analysed quantita¬ 
tive facts.’ 

—Hobitt W. Bngtss 

Functions of Statistics 

<nphe proper function of statistic**, according to Bowlty, 
* ‘is to enlarge individual experience*. Besides adding to our 
knowledge of complex phenomena, statistics lends precision to 
our ideas that would otherwise remain vague and indetermi¬ 
nate, Out knowledge of such things as National income*, 
‘population*, ‘natural resources*, etc., would not have been so 
definite and precise if there were no reliable statistics pertain* 
»ng to each one of these objects. To say that the per capita 
income in India is low, is a vague statement. ‘Low’ to one 
individual may mean one thing while to another it might 
mean something altogether different. I may take it to be 
near about Rs, 100 while someone else may think it to be in 
the neighbourhood of Rs, 500. But the moment wc say that 
our per capita income is Rs. 250 we make a statement which 
is precise and convincing. Again, a statement that the per 
capita income in agricultural sector is lower than in the indus* 
trial sector, is vague and indefinite. But if the per capita 
incomes for both these sectors are ascertained, the comparison 
would be easier and even a layman would 'be able to appreciate 
the difference in the productivity of these two occupations* It 
can thus belaid that 'Statistics increases the held of mental 
vision as an opera glass or telescope increases the held of 
physical vision . 1 Statistics is able to widen our knowledge 
because of the following services that it renders: 




12 AN INTRODUCTION TO STATISTICAL METHODS 

I. It presenti facta in a definite farm. It is the quality of 
definiteness, which h. responsible for the growing universal 
application of statistical methods* The conclusions stated 
numerically are definite and hence more convincing than 
conclusions stated qualitatively. This fact can be readily 
understood by a simple example. In an advertisement* state¬ 
ments expressed numerically have greater attraction and arc 
more appealing titan those expressed in a qualitative manner. 
The caption, ‘we have sold more cars this year/ is certainly 
less attractive than ’Record Sale of 10,000 cars in J%2 as 
compared to 6, (MX) in I9til\ The latter statement emphasises 
in a much better manner the growing popularity of the adver¬ 
tiser’s cars. 

2* Statistic* simplifies unwieldy and complex mass of dal a and 
presents them in such a manner that they at once become 
intelligible. The complex data may be reduced to totals, 
averages, percentages, etc*, and presented either graphically 
or diagrammatically. These devices help us to understand 
quickly the significant characteristics of the numerical data, 
and consequently save us from a lot of mental strain. Single 
figures in the form of averages and percentages can be grasped 
more easily than a mass of statistical data comprising of thou¬ 
sands of facts. Similarly, diagrams and graphs, because of their 
greater appeal to the eye and imagination render valuable 
assistance in the proper undemanding of numerical data. 
Time and energy of business executives are thus economised if 
the statistician supplies them with the results of production, 
sales and finances in a condensed form. 

3. Statistics classifies numerical fads. The procedure of 
classification brings into relief the salient features of the vari¬ 
able that ts under investigation* This can be clearly illustrat¬ 
ed by an example. If we arc given the marks in mathematics 
of each individual student of a class and if it is desired to 
judge the intelligence of the class on the basis of these data it 
will not be an easy matter. Human mind has its limitations 
and cannot easily grasp a multitude of figures* But if the 
students art classified, i.e , if we put into one group ail those 



FUNCTION* IMPORTANCE AND LIMITATIONS 13 

boys who get more than second division marks, in still another 
group those who get third division marks, and have a separate 
group of those who fail to get pass marks, it will he easier for 
us to form a more precise idea about the intelligence of the 
class, 

4. Statisttes furnishes a technique of comparison. The facts, 
having been once classified, are now in a shape when they can 
be used for purposes of comparisons and contrasts. Certain 
facts, by themselves, may be meaningless unless they are 
capable of being compared with similar facts at other places or 
at other periods of time. We estimate the National Income of 
India not essentially for the value of that fact itself, but mainly 
in order that we may compare the income of today with that 
of the past and thus draw 7 conclusions as to whether the stall* 
dard of living of the people is on the increase, decrease oris 
stationary. Statistics affords suitable technique for comparisons. 
It is with the help of statistics that the tost accountant is able 
to compare the actual accomplishment (in terms of cost) with 
programmes laid out (in terms of standard cost). Some of the 
modes of comparison provided by statistics are : Totals, Ratios, 
Average* or Measurements of Central Tendencies ; Graphs and 
Diagrams ; and Coefficients. Statistics thus ‘serve* as a scale 
in which facts in various combinations arc weighed and 
valued,* 

5. S Utilities endeavours to interpret evtuinww, lake an artist 
statistics renders useful service in presenting an attractive 
picture of the phenomena under investigation, But it frequently 
doe* far more than this by enabling the interpretation of con* 
dztion, by developing possible causes for the results descrilied. 
If the production manager discovers that a certain machine is 
turning out some articles which are not uptri the standard 
specifications, he will be able to find out statistically if this 
condition is due to some defect in the machine or whether 
swell a condition is norma), 

Importance of StatittiCfi 

Statistical methods have become useful tools in the world 



14 Alt INTRODUCTION TO STATISTICAL METHODS 

of affairs. Economy and a high degree of flexibility are the 
important qualities of statistical methods that render them 
specially useful to businessmen and scientists. 

Statistics and Business 

The need for ifatisriaiJ information in the smooth func¬ 
tioning of an undertaking increases along with its size. The 
bigger the concern the greater is the need for statistics, In the 
era preceding the Industrial Revolution the master craftsman 
was in intimate touch with the sources of the supply of raw 
materials. He worked in his own home with the help of the 
members of his family and a few other employees whom he. 
knew rather well, His customers were few and he knew them 
all personally. Thus he had almost all information about his 
business and obviously no technique for the supply of this 
information was necessary. 

Today also, in an era of mass production technology, the 
business executive needs all such information for the successful 
conduct of affairs. But he cannot, even if he were to try, get 
this information in the same manner as the master craftsman 
did. Naturally, therefore, resort is had to the statistical tech¬ 
nique and statistics takes the place of personal observation. 
Tor better or worse, the modern business executive is largely 
dependent on statistical data and methods of analysis for 
ewent ml i nformat ion. ’ 

No business, large or small, public or private, can flourish 
in these days of large-scale production and cut-throat compel i« 
lion without the help of statistics. Statistical information is 
needed from the time the business is launched till the (ime 
of its exit. At the lime of the floatation of the concern facts 
are required for the purposes of drawing up the financial plan 
of the proposed unit. All those factors that arc likely to affect 
Judgment on these matters will be quantitatively weighed and 
statistically analysed before taking any decisions. 

A shrewd manufacturer must know in advance *how much 
is to be produced/ ‘how many workers and how much raw 
material will be needed to produce that estimated quantity* 



WVNOttmBi IMPORTANCE AND LIMITATIONS 15 

And 'what quality, type, size, colour or grade of the product is 
to be manufactured/ In short, he must hare a production 
plan. Now such a plan—requiring all the details given above'— 
cannot be framed without quantitative facts. Statistics is 
thus a tool of production control 

Quantitative data will have to be collected and analysed if 
a workable personnel plan is to be carried out. The only route 
for the personnel officer or labour officer to get acquainted with 
the labour force numbering hundreds, or thousands, or even 
lakhs, is to know its members through statistical analysis of 
information largely quantitative. Wage levels and wage stan¬ 
dards also require the statistical study of different jobs within 
the same organisation and the study of wages in like business 
undertakings. 

In a labour dispute it is the official of the union that 
generally represents the workmen. It is through statistical 
data that the man representing the workers, knows about the 
working conditions, rates of wages, frequency of lockouts, 
monthly earnings and such other matters in the industry where 
the dispute has arisen as well as in other industries. Again, in 
negotiation conferences, proper data competently and honestly 
collected and analysed may lead to an early and just solution 
of the differences. 

Statistical methods of analysis are helpful in the marketing 
function of an enterprise through its enormous help in market 
research, advertisement campaigns and in comparing the sales 
performances. Statistics also directs attention towards the 
effective use of the advertising funds. 

Above all statist ical methods of analysis provide an impor¬ 
tant tool to the management for cost and budgetary control 
The most elementary use to the management is in the balancing 
of the activities of one part of a system against those of another, 
to secure that supplies equal requirements and that there are 
no ‘bottlenecks* or part* that arc not employed to the full 

Statistics is thus a useful tool in the hands of the manage¬ 
ment. But it must be remembered that no volume of statistics 
can replace the knowledge and experience of the executives. 



16 AN INTRODUCTION TO STATISTICAL METHODS 

Statistics supplements their knowledge with more precise facts 
than were hitherto available. 

Statistics and! Economics 

Statistical data and methods of statistical analysts render 
valuable assistance in the proper understanding of the economic 
problem* and the formulation of economic policy. Economic 
problems almost always involve facts that are capable of being 
expressed numerically, e.g M volume of trade, output of Indus* 
tries—manufacturing, mining and agriculture, wages, prices, 
bank deposits, Clearing House returns, etc. These numerical 
magnitudes arc the outcome of a multiplicity of causes and are 
consequently subject to variations from time to time, or bet- 
wren places or among particular rases. Accordingly the study 
of economic problems is specially suited to statistical treatment. 

Let us take an example to clarify this point. A proper 
appreciation of the nature and magnitude of the problem of 
unemployment would necessitate a knowledge about the follow¬ 
ing : h unemployment increasing or decreasing ? Is it wide¬ 
spread or largely confined to certain arras ? Does it affect 
the educated and uneducated alike or is more pronounced in 
any particular class ? Which industries arc expanding and 
which are contracting ? Has there been any remarkable 
increase in the population ? All these questions can be answered 
statistically, and the resultant data will enable us to form a 
correct estimate of the problem, ft is natural, therefore, that 
there is a growing emphasis on the importance of collecting 
systematic and regular statistics bearing on every aspect of our 
economy. Mere collection of data, however, is not enough. 
The complexity of all such problems makes it imperative that 
the collected data be condensed and analysed —condensed in 
order that it may be possible for limited human faculties to 
handle, analyser! in order that the elements in the problem 
may be distinguished and their significance appreciated. 

A statistical approach to an economic problem not only 
leads to its correct description but also indicates lines along 
which it is to be tackled. The great emphasis that was laid 



FUNCTIONS, IMPORTANCE AND LIMITATIONS 17 

upon the development of agriculture by the Planning Com¬ 
mission in the First Five-Year Plan can be defended with the 
help of factual data bearing on our economy. In a country 
where agriculture contribute* nearly 50 per cent of its total 
income, and where the net output (in agriculture) per engaged 
person is the lowest, being only Rs. 500 per year, it is to the 
fitness of things that the Planners have laid so much emphasis 
on the improvement of our agriculture. 

Apart from economic policy, the development of economic 
theory has also been facilitated by the use of statistics. The 
complexity of modern economic organisation has rendered 
deductive reasoning inadequate and difficult. Statistics is now 
being used increasingly not only to develop new economic 
concept# but also to test the old ones. 

Limitation# of Statistic# 

That statistical technique, because of its flexibility and 
economy, is growing in popularity and is being successfully 
employed by seekers of truth in numerous fields of learning is 
a fact that cannot be denied. But it is not without limitations. 
It cannot be applied to all kinds of phenomena and cannot be 
made to answer all our queries. 

This is due, in the first instance, to the fact that statistics 
deals with only those subjects of inquiry which are capable of 
being quantitatively measured and numerically expressed. 
This it an essential condition for the application of statistical 
methods. 

Now all subjects cannot he expressed in numbers. Health, 
poverty, intelligence (to name only a few) are instances of the 
objects that defy the measuring rod, and hence are not suitable 
for statistical analysis. It is true that efforts are being made 
to accord statistical treatment to subjects of this nature also. 
Health of the people is judged by a study of its death rate, 
longevity of life and the prevalence of any disease or diseases. 
Similarly, intelligence of the students may be compared on the 
basis of marks obtained by them in a class test. But these are 
only indirect methods of approaching the problem and stihtt- 
2 



18 AW IWTOODtJCTION TO STATISTICAL METHODS 

diary to quite a number of other eonsideraliont which cannot 
be statistically dealt with. 

Again statistics deals only with aggregate of facts and no 
importance is attached to individual hems. It is, therefore, 
suited to only those problem* where group characteristics are 
desired to be studied. But where the knowledge about indi¬ 
vidual eases b necessary statistical technique proves inadequate. 
The per capita consumption of food grains in a state will 
camouflage cases of starvation if any. The scarcity felt by the 
poorer section may be more than made up by the extravagance 
of the rich. In such cases, therefore, statistics will fail to 
reveal the real position. 

Another defect of statistics lies in (he fact that statistical 
data is only approximately and not mathematically correct. 
Greater and greater emphasis is being laid on sampling tech¬ 
nique of collecting the data. This means that by observing 
only a limited number of items we make an estimate of the 
characteristic of the entire population. This system works 
well so long as mathematical accuracy is not essential. But 
when exactness is essential statistics will fail to do the job. 

Statistics can be used to establish wrong conclusions and, 
therefore, can lie used only by experts. This is discussed in 
detail under the head ‘Distrust of Statistics*. 

Distract of Statistics 

In spite of the very valuable service that statistics renders 
to business community and to scientists, both social and 
natural, there is some amount of misgiving, in the minds of a 
few people with regard to their reliability and usefulness. 
Thb feeling has been given expression in a number of ways of 
which the following are the often quoted examples; 

♦There are three kinds of lies—-namely, lies, damned lies, 
and statistics—wicked in the order of their naming.’ 

•With statistics anything can be proved.* 

Such misgivings can be attributed to mainly two causes : 
(t) figures carry* conviction and are capable of being easily 



rtmcnom , imfobtanck and limitatiow 19 

manipulated; (ii) the presence in this world of persons who 
are selfish and unscrupulous. 

If an argument is supported by facts stated numerically it 
has a much greater appeal than the one without them. It Is 
because of this that in a discussion the winner is almost invari¬ 
ably he who is able to substantiate his point with facts and 
figures. Naturally, therefore, if a lie is to be pushed through 
the best way is to give figures in the support of it. Statistical 
data do not carry the label of their quality and as such can be 
manipulated in any desired manner without causing the least 
suspicion. Statistical data and conclusion may be manipulated 
in either of the following ways : 

(a) Shifting definitions. A slight alteration in the define 
tion of a key term might provide a basis for conclusions 
which are not warranted by facts. Thus while making com¬ 
parison of the number of workers employed in two industrial 
units misleading conclusions may be obtained if the meaning 
assigned to the term ‘workers* is not identical while conducting 
the census of workers in the two units. In one case 'workers* 
may include casual workers also while in the other they may 
be excluded. 

(b) Methods of selecting cases. If it is desired to ascertain 
the number of school age children per family it may be done 
in either of the two ways—(i) by questioning a number of 
children in schools, and (ii) by conducting a survey of a 
number of families. It is quite possible that the result obtained 
by the former method may be different from the result obtained 
by the latter method. If we take a simple case of two families, 
one with a single school age child, and the other with six, the 
number of school age children per family would be 3*5 under 
the second method. But if we apply the first method the 
result would be higher, i.e., if each of the seven children were 
asked the number of school age children in his family, the total 
of seven replies would be 37 and the average 5*286 children 

per family. 

(c) Inappropriate comparison. Wrong conclusion may be 
obtained by comparing ‘statistics* which are not essentially 



20 AW INTRODUCTION TO STATISTICAL WRTOHDS 

comparable. Tim* if the Items included in the construction 
of a ‘price index* have changed over a period of time the 
comparison of the price index as it was in the beginning of 
the period with that at the end of the period would lead to 
misleading results. 

(d) Misinterpretation of association of correlation. Some¬ 
times a certain degree of association or correlation may be 
apparent from a set of figures when none actually exists. 
A statement u those who drink die before reaching the age of 
100 yean and hence drinking is harmful for longevity" is a 
cate in point. Unless it is shown that those who do not drink 
live upto 100 years of their age or a higher age than those who 
drink the conclusion contained in the statement is fallacious. 

(e) Inadequate sample. Statistical data may lead to mis¬ 
leading result if the investigator jumps to a conclusion on the 
basis of too small a sample or one which does not represent the 
whole population adequately. Thus if a coin is tossed four 
times and the head turns up thrice the conclusion that the 
probability of getting a head is *75 is wrong because it is based 
on inadequate sample. The sine of the sample should lx* suffi¬ 
ciently large and it must Ik representative of the population. 

Now if such manipulated figures are quoted in support of 
one*s point of view they are likely to mislead people. And 
since there is not a great scarcity of such selfish and unscrupu¬ 
lous people who do not hesitate in distorting facts, statistics ate 
used to prove the worst lies. 

But if statistics art made to prove anything, it is certainly 
not the fault of statistics as such but of the person who makes 
them the tool of his personal aggrandizement. 

It must, therefore, be dearly stated that the useful appli¬ 
cation and co-ordination of statistics is not exclusively depen¬ 
dent upto the degree of skill, ability and special experience of 
the statistician. No matter how best the statistical techniques 
are used in analysing, representing or interpreting these infor¬ 
mations, the conclusion so derived may become unreliable and 
even useless, if the enumerators or investigators arc biased and 
prejudiced. 



rtmcrtom, importance ano umitations 21 

‘The statistical method, like scientific methods in general, is 
based on certain fundamental principles ; it is not entirely 
automatic in its operation, and progress in knowledge depends 
to a considerable degree on the personal qualities of the investi- 
gator. He must be creative of ideas, yet should strike a nice 
balance between being too farfetched and fanciful on the one 
hand and being so conservative on the other that he impedes 
progress by his tinwillingness to admit new knowledge and 
ideas/ {Tippett) 



Chapter 3 
Statistical Inquiries 


B y 'inquiry* we menu 'a search for knowledge'* Statistical 
inquiry, therefore, implies a search conducted according to 
the statistical technique. The technique of statistics, however, 
cannot he applied to all kinds of phenomena. Its application 
is restricted to only those subjects which can be measured 
quantitatively. It is possible to know statistically as to 
whether the distribution of wealth in a particular country is 
equitable or not for the simple reason that both wealth and 
the concept of equity are capable of being expressed numeri¬ 
cally* But if it is desired to know whether such a distribution 
is justifiable, statistical methods will not be of much assistance 
as 'justifiability* cannot be quantitatively measured. 

Statistical approach to a problem may broadly be summa¬ 
rised under the following four heads : 

1. Collection of facts, 

2. Organisation of facts, 

3. Analysis of facts, and 

4. Interpretation of facts. 

A detailed discussion of the various methods of collection, 
presentation, analysts and interpretation of facts is given at 
appropriate places. Here the intention is to give only a bird's 
eye view of the entire statistical procedure. 

1. Collection of facts is the first step in the statistical 
treatment of a problem. Numerical facts are the raw materials 
upon which the statistician is to work and just as in a manu¬ 
facturing concern the quality of a finished product depends, 
tutor aha, upon the quality of the raw material, in the same 
manner the validity of statistical conclusions will be governed, 
among other considerations, by the quality of data used. 
Assembling of the facts is thus a very important process and 




STATISTICAL INQUIRIES 23 

no palm should be spread to see that the data collected are 
accurate, reliable and thorough. There are four different 
methods of collecting the data, vis., (t) Library method, 
(ii) Experimental method, (Hi) Observation, and (iv) Ques¬ 
tionnaire method. 

AU these methods of collecting the data may be broadly 
divided into (i) Primary (ii) Secondary, and will be explained 
at a later stage. One thing that should be noted here is that 
the work of collecting facts should be undertaken in a planned 
manner. Without proper planning the facts collected may 
not be suitable for the purpose and a lot of time and money 
may be wasted. 

2. The data so collected will more often than not be a 
huge mass of facts running into hundreds and thousands of 
figures. Human mind has its limitations. No one can appre¬ 
ciate at a glance or even after a careful study hold in mind the 
information contained in a hundred or a thousand schedules. 
For a proper understanding of the data their irregularities 
must be brushed off and their bulk be reduced, i.e., some 
process of condensation must take place. Condensation 
implies the organisation, classification, tabulation and presenta¬ 
tion of the data in a suitable form. 

3. The process of statistical analysis is a method of abs* 
tracting significant facts from the collected mass of numerical 
data. This process includes such tilings as 'measures of central 
tendency*—the determination of Mean, Median and Mode— 
'measures of dispersion* and the determination of trends and 
tendencies, etc. This is more or less a mechanical process 
involving the use of elementary mathematics. 

4. The interpretation of the various statistical constants 
obtained through a process of statistical analysis is the final 
phase or the finishing process of the statistical technique. It in¬ 
volves a study of those met hods by which judgments are formed 
and inferences obtained. To make estimates of the population 
parameters on the basis of sample statistics is an example of the 
problem of interpretation. For the interpretation of results a 
knowledge of advanced mathematics it amentia!. 



Chapter 4 

Collection of Date (i) 


Preliminary Consideration 

T here are two things which must be carefully considered 
before starting the work of data collection. The first is the 
statement of the purpose of the inquiry in very clear and 
unambiguous terms ; and the second is the formalisation of a 
plan of data collection. 

i* Statement of the Purpose 

The first and the most essential thing which the statistical 
investigator must do ts the preparation of a statement of the 
purpose of the statistical inquiry in hand. Failure to work out 
a statement of purpose dearly and carefully, can lead only to 
misunderstanding and confusion. It will result in diffusion 
of effort, gradual over-expansion of the field covered and 
aimlessness. The purpose of a statistical inquiry may be 
either (»*) to supplement, disprove or simply to test some theory 
or hypothesis, which is current, or <;»} to discover a new 
theory or hypothesis, or (it«)' to know the existing state of 
affairs, or ) to solve a problem involving the inter-relation? 
of several groups of facts. A statistical inquiry may, for 
example, be concerned with the problem of unemployment. 
Now there are various aspects of this problem—urban or rural 
unemployment, educated unemployment or unemployment of 
the uneducated, partial or full unemployment. So long as the 
investigator does not know the objects of his inquiry he will 
not be sure as to wh.it facts arc to be collected. 

It is of imperative necessity that statement of purpose must 
be precise. A general statement is inadequate. Even a slight 
variation in the purpose of inquiry may require entirely or 




25 


COLt£CTIOK or DATA (l) 

partially different statistical data. Thai, if an economist 
desires to study and compare the scale of wages in different 
localities, he must be very dear about the following points ; 

(a) Whether the requirements of his problem demand 
information on wage rates or actual earnings ? 

(4) If on wage rates, should it refer to rates approved 
by collective negotiations or to rates actually paid ? Should 
adjustments be made for overtime, undertime, bonuses and 
lines ? 

(c) Whether supervisory and clerical workers arc to be 
included ? 

(d) Whether receipts in kind he added to wage rates ? 

Having ascertained the purpose of the inquiry, it is desir¬ 
able for the investigator to acquire sonic general information 
about the thing he is investigating* If the purpose of the 
inquire, for example, is to determine the best site arid location 
of a proposed plant in a particular area, the investigation must 
have some knowledge of the technical requirements of that plant. 
2 . Plan of Procedure 

Once the purpose of inquiry is cleatiy stated and a working 
knowledge of the phenomena under investigation is acquired, 
the investigator must prepare a plan for conducting the inves¬ 
tigation. In drawing up the plan, attention should be directed 
to the following points . 

(r) Scope of inquiry, 

(it) Determination of the unit, 

(ill) Technique of data collection, and 
(it?) Degree of accuracy , 

(I) Scope of Inquiry 

The scope of a particular statistical inquiry will be decided 
with reference to either (ir) the space, (b) the time, or (c) the 
number of items to be covered. As regards space, the statis¬ 
tician would have the right to fix such limits as might serve his 
purpose in the licit possible manner. In practice, however, the 
following limits are generally used : 

Political and administrative divisions such as the country, 



26 AW INTRODUCTION TO STATISTICAL METHODS 

the state, the district, the city, the municipality. Use ward, the 
circle, the block, etc. 

Economic divisions such as agriculture and animal hus¬ 
bandry, mining, manufacturing, trade, transport, communi¬ 
cations, banking, professional, liberal arts, etc. 

Natural or climatic divisions such as the plains, the 
mountains, the plateaus, the forests, the coasts, etc. 

As regards time, it must be noted that the work of collection 
of the data must be finished within a reasonable time. For, if 
more than reasonable time is taken in the collection of the 
numerical facts, conditions might change and the data collected 
may be rendered useless for the purpose of the inquiry in hand, 
‘Reasonable time 1 depends upon the nature of the phenomenon 
under investigation. If the phenomenon is such where the 
conditions change quickly and frequently, the duration of the 
process of investigation should be narrowed to such an extent 
that there is no possibility of a change affecting the data. 
Certain problems arc of topical interest only and in such cases 
the plan of statistical investigation should be so devised that the 
entire work is finished before the inquiry ceases to have an 
interest. 

The decision with regard to the number of items that are to 
constitute the data is, in fact, a very important one. In 
other words it means the question of choice between the 
census and the sampling technique of data collection. By 
census method we mean a method where each item constituting 
the population or the universe is enumerated. Sample method, 
on the other hand, means a system where only a Limited number 
of items is taken into account. This limited number of items 
is regarded as the sample of the population. The way or ways 
in which sample should be chosen, and die question of deciding 
between these two alternative methods of data collection are 
discussed in detail in Chapter 6. 

(II) 

Quantitative science presupposes the presence of a urns. 

4 Robert RJegek EUmmti $mmmt Stotitfiu 



27 


COIXEGTOIN OF 0ATA (l) 

Statistical unit is necessary not only for the collection of data 
but also for their interpretation and presentation. For the 
gathering of raw material clear definition of unit is of primary 
importance; for the interpretation of the results the unit assumes 
still greater importance, and the presentation of facts without 
units is valueless. Thus in all the processes of statistical 
machine, the statistician is working with units, whether they be 
dearly or poorly defined, suitable or unsuitable to the .uses to 
which they are put. 

Importance of Statistical Unit# 

For the correct solution of any statistical problem, it is 
not enough that the facts are collected with the maximum of 
accuracy, but it is essential too that th<* unit employed is appro* 
priatc. The mistake in the selection of the unit is more 
harmful than mistakes in the collection of data. A* an illustra¬ 
tion, suppose that it is desired to restrict the output of cotton 
textiles and as such production quotas are to be fixed on the 
basis of the size of mills. Now the size of a manufacturing 
concern may be measured in terms of either the Volume of 
output’, 'capital investment*, ’number of wage earners emp¬ 
loyed’, 'the volume of product* or ‘the number of looms 
installed'. For our purpose, however, it is tire volume of output 
that is the best criterion of determining the size. If a unit other 
than the Volume output* is chosen in this particular case the 
results are bound to be incorrect and misleading. The impor¬ 
tance of this concept (unit) is still greater in the interpretation 
of data for the simple reason that more people use statistics 
than compile them. If we collect data regarding strikes and 
lockouts in cotton textile industry from the different states of 
India during a particular year, the figures so collected will 
show total number of man-hours lost due to work stoppages. 
Now if work stoppages are not properly defined some states 
might give the figures of only those ‘work stoppages* which 
lasted for more than a month, some might give figures only for 
those in which the work stoppages lasted for more than a week, 
acid some states might give figures even of those which lasted 



28 A n INTRODUCTION TO STATISTICAL METHODS 


for only one or less than one day. The obvious result would 
be that the data so obtained would not be comparable and 
would fail to give an idea about the relative condition of 
industrial relations in different states. 

The appropriateness of a unit depends on the purpose of 
statistical investigation. The purpose of the statistical inquiry 
and definition of the unit are reciprocal. The latter is deter¬ 
mined by the former and the former is governed by the latter, 
'Statistical units cannot he defined without regard to the object 
in view and the purpose of inquiry cannot be stated with 
sufficient accuracy without a clear-cut notion of the units.* 

In the selection of the units the following are the main 
precautions that must be observed : 

L The unit of measurement must be definite ami specific. 
2. The unit must be of such & nature that it may be 
correctly ascertained. 

H, Care must be taken to ensure homogeneity and uni¬ 
formity. A given choice of unit is unsatisfactory, if it 
implies different properties on different occasions or 
by different persons, 

4. The unit should hr stable. 

5, The unit must be appropriate for the purpose, The 
strength of an annv from the fighting standpoint 
depends upon the number of combatants, whilst from 
a commissariat standpoint it depends upon combatants, 
pint auxiliary services, plus sick and wounded.’ 1 

Classification of Units 

The classification may tie made either on the basis of: 

I. the organisation of the unit, or 

2. the function performed by the unit. 

I* Classification of Statistical Units as regards their 
Origin 1 

Statistics has been defined as the technique of counting ami 
measurement. In the counting operations the unit employed 

t I,, R. Connor : Sletnik* •»« Th**y mtf /*tacfu* t p, U». 

4 Robert Kiegd : at. 



COLLECTION OF DATA (l) 29 

is invariably a unit which arises spontaneously such as a man, 
a house, an accident, a city, a town, etc. In the p ocess of 
measurement the unit is artifically designed solely for the object 
of measurement, such as a kilogram, a rupee, a kilometre, etc. 
To quote Corner, ‘Individuals or objects separated by distinc¬ 
tive marks, may be counted, whilst other objects must be 
measured by reference to arbitrary or conventional unit#.* 

1. Spontaneous units used in counting process 

(n) Natural units. 

[h) Produced units, objects and qualities. 

M . Artificial units ; used in measurement process 

(a) Mensuration*! units. 

( b) Pecuniary value units. 

f. Vi Xaturd Units. The examples of such units are a 
person, a tree, a cow, etc. Such units are ordinarily superior 
to other types in defmitness and fulness of meaning. A# an 
example compare a census of * persons living in Delhi* with a 
census or‘industrial workers*. The funner involves counting 
the number of natural units, and as such instructions to enu¬ 
merators are simple and mis understandings seldom arise. In 
the latter case, however, the term ‘industrial worker* is hard 
to determine. What one person would consider as industrial 
worker may be rejected by another. It is, therefore, necessary 
to have a dear definition of the term ‘industrial worker*. 

It is sometimes alleged that the natural unit# vary in sixer 
or other characteristics, and ns such the collected data are not 
comparable. Thus one man is taller than another, one tree is 
much bigger than another, or a horse is much heavier or 
stronger than another. But this is not a sound objection as 
such variations are no detriments because they are regularly 
distributed about a mean in a shape so well known as to 
partake of the nature of a ‘statistical law'. 1 

I. (b) Produttd Units — Qhmls. The example# of produced 
units are a house, a car, a table, a typewriter, a ship, etc. 
Produced units consists of ‘natural materials changed for 


1 Robert Rirget: if. ti p, 147. 



$0 AN INTRODUCTION TO STATISTICAL METHODS 

human advantages and use*. These units are not identical 
with natural kinds* however, familiar they may be, ‘Each 
natural kind is unique by reason of a large number of separate 
and distinct characteristics, whereas the distinguishing feature 
of a produced object is its function .* 1 This type of unit has 
some merits of the natural unit But it has two serious 
defects ; (i) It is hard to frame an exact definition of such a 
unit, because an object may be put to two or more different 
uses. Thus if a room is being used as sleeping room as well as 
a study room it would be incorrect to call it a ‘study* or a 
'sleeping room*, (it) Another difficulty with such units is the 
fact that they undergo temporal changes. Thus passenger 
buses of today arc much bigger than those of a decade ago. 

Qttalitifs, This unit consists of‘qualities produced in beings 
or things by human relations’. The examples are wages, 
grades of a product, income, family, occupation, etc. The 
object remains the same but some quality is added, subtracted 
or modified. As the qualities are subject to change due to 
action of law or custom, exact definition becomes even more 
difficult. 

IK {a) Ahnsurafitwal Units . These are the units for 
purpose of measurement, whereas natural kinds, produced 
objects and qualities are for purpose of counting. Mensura¬ 
tional unit is conventional or arbitrary, whereas the other three 
(mentioned above) are natural or spontaneous. Examples 
of merourational units arc the ton, kilogram, the metre, the 
mile, the ton-mile, the year, etc. Tire difficulty with such 
units is that tire same word has many meanings. Abstractly, 
all ‘tommiles* are alike, concretely they arc different. To 
quote Secrist, ‘While a ton is invariably a ton, and a mile a 
mile, all tons except as to the one quality, weight, are not 
necessarily the same, nor are all miles, except as to the one 
quality, distance, always equivalent. One ton may be bulky, 
k>w grade freight; another ton may be compact, high grade 
freight. Likewise one mile may be of easy grade in a plain; 


V IMd . 



COLLECTION OF DATA (l) 31 

the other of heavy grade in mountainous tunnel. The condi¬ 
tions necessary to the movement of one ton one mile—the ton- 
mile—may be wholly dissimilar in spite of the common name 
which is assigned to the services. Again, the mensuralional 
unit directs attention only to one characteristic feature of the 
article, indicates nothing regarding shape, quality, sire or 
material. 

The merits of such units lie in their importance as a com¬ 
mon denominator, convenience for expressing quantity and 
easiness in mathematical operations. Thus coal and iron may 
be expressed in tons and the sum of quantities gives the total of 
coal and iron mined. Thus the different articles are reduced 
to common denominator, either for purposes of comparison or 
of combination. 

If. (b) Pecuniary Value Unit. r. It is a type of unit that 
measures financial importance, as rupee, dollar, sterling, etc. 
The principal defect inherent in such a unit is that statistics 
stated in rupee can only yield indices of value and not of 
quantity. As an example in the field of business, the amount 
of turnover (on an average] of a grocery and provision mer¬ 
chant is much less than the amount of turnover of a jeweller ; 
whereas in reality the grocer might have sold more quantities 
as regards weight and number than the jeweller. 

The advantage of pecuniary value unit is its wide applica¬ 
bility as a common denominator and the case of utilising it. 
In discussing the foreign trade of India or the balance of pay¬ 
ment problem, the pecuniary value unit is employed. Had 
there been no pecuniary value unit, it would have been a 
problem ‘how to add tons, pieces, yards, bales and bushels, 
etc*. For purposes of comparison between diverse articles, 
pecuniary value unit serves as a common denominator Thus 
we can compare the production of cereals with that of minerals. 

Ctasatficatfon according to Function* of Unit* 

As regards their use in statistics, units may be classified as : 

I. Units of enumeration, 



32 AH INTRODUCTION TO STATISTICAL METHODS 


II. Unit* of analysis and interpretation, and 
lit. Units of presentation. 

Units 

..I... 

Units of Measurement Units of Presentation 


J f f | j 

Units of Unit* of Timr Spacr Condition 

Enumeration Analyti* (0 Year (i) Nation (V) Quahiativr 

j J [ii) Month {ii! Starr (it) Quantitative 

f j Ratio or (Hi, Day {m City 

1 ' coefficient* (it) Locality 

Simplr Composite *U. 

!. (/nits of enumeration dr estimation. These units may l>e 
divided into (t) simple, and (it) composite. A simple unit is 
one in which ‘one determining consideration is prescribed 1 , i.e., 
a unit in which the ideas expressed are general and only class 
differences are distinguished. The examples are : an animal, 
a farm, a mile, a ton, an accident, a house, a man, a dispute, 
a room, a city, etc. Such units have no limiting qualifications, 
and are easily defined. 

If we add a qualifying limit to the simple unit the unit 
becomes composite. The effect of adding a qualifying phrase 
may be (i) to describe more correctly the general concept, 
(ti) to limit the class which it names, (iii) to add to the dtffi- 
cully of defining it. Examples of composite units are a farm 
animal, a ton-mile, an industrial accident, a rent-free house, an 
industrial dispute, a vacant room, credit sale, chain-store, etc. 
The chance of error and bias increases as we add limiting 
conditions to simple units. 

11, Units of analysis and of interpretation . Such units are those 
in which things or attributes of things are not only named but 
also compared. To compare things rates, ratios and coeffi¬ 
cients are used. The ratios and coefficients may relate to 
time, to space, or the conditions in time or space. Road 
accidents may be expressed by number or by severity, but 
related to months or year* The production of food crop may 
he expressed in quintals, but related to the area tinder food 
crops. Deaths during a given period, or for a particular 



COLLECTION OP DATA (i) 33 

region, may be expressed in numbers, but they may be related 
to the entire population of the same age, caste and sex 
characteristics. 

III. Units njprtitniatitm. Units of presentation are of three 
types : (i) time, (ii) space, and (tit) condition. For instance, 
the manufacturing expenses of a group of concerns may be 
measured and presented by years, by location, by site, by nature 
of industry, or by the efficiency and nature of management. 

The turnover may be shown by months or by rones of territory, 

\ 

(Hi) Technique of Data Collection 

Having determined the scope of inquiry and units of 
measurement, the next step in draw ing up the plan of investiga¬ 
tion it to determine the best method of data collection, 

In order to obtain statistical material the investigator may 
cither (i) go to the records of some institution, whether public 
or private, that collects and publishes data as a routine, or 
ii} make a special survey, i.e. f conduct a field inquiry. Thus 
there are two sources of information available to the flatistician. 
The former is usually termed as ‘Secondary Sourre* and the 
latter as ‘Primary Source.’ Each source of information has its 
own merits and demerits. The selection of a particular source 
is dependent upon a variety of factors such as (i) the purpose of 
inquiry, {ii} time required, (tit) the accuracy desired, (ta) funds 
available,‘V) other facilities available, and (w) nature of the 
pcn*on conducting investigation. However, this subject is so 
broad that a subsequent chapter is devoted to a discussion of 
the techniques of data collection. 

(!v) The Degree of Accuracy 

Another important step in planning a statistical investiga¬ 
tion is the determination of the standard of accuracy that is to 
be observed in the collection of statistical material. While 
determining the standard it is necessary to bear in mind two 
things—(*) the degree of accuracy that is necessary, and (if) the 
accuracy which is actually attainable. Absolute accuracy ss 
neither possible nor very necessary. When the number of 

r 



34 AN INTRODUCTION TO STATISTICAL METHODS 


informants ts very large and there if a great divergence in the 
standard of their education and intelligence it is not possible to 
get correct information about each one of them. Some may 
unconsciously be left out of consideration altogether and a few 
may fail to supply information of a requisite standard. Thus 
some inaccuracy is bound to creep in even in spite of our best 
efforts, list results of statistical analysis, however, will not be 
significantly affected by this unavoidable inaccuracy. Statistics 
it concerned with an overall picture of the universe and minor 
discrepancies will not alter the ultimate conclusions. Mathe¬ 
matical exactness in statistical investigation, therefore, is 
neither possible nor very necessary. But statistical data must 
possess a reasonable degree of accuracy, and every effort should 
be made to attain it. What is ‘reasonable 1 accuracy will 
depend upon the circumstances of each case. For instance, in 
measuring the bright of an individual even centimetres will be 
considered, while in measuring the distance between two towns 
even hectometres may not be given any importance. 

Approximation 

Tn some cases, the results of the measurement, estimation 
(or calculation) of the statistical phenomena are not given in 
all the digits that are made available. Thus if the census of 
population of a certain district gives a total of 85,39,421, the 
last three digits arc dropped by approximation and the 
population is given as 85,39,000. This is railed the method of 
approximation and is adopted because it facilitates calculation 
and simplifies comparisons. Besides, if the results art put to 
the last digit thev air likely to give an impression of the 
accuracy which thev do not possess.. 

Mfth#d r of The first thing that is necessary 

in this connection is to determine the degree to which accuracy 
is to Ik* maintained. Thus if the data are to lie correct to the 
nearest thousand it means that the last three digits are to lx* 
rounded. In this case amounts less than 1,000 will be regarded 
as fractions of a thousand— more than 500 being counted more 
than one-hall and less than 500 lie lug taken as lem than one* 



collection or data (j) 35 

half. This being understood, the rules of approximation may 
be given as follows : 

A method of approximation that is commonly adopted it to 
ignore fractions less than half and count fractions more than 
half as full. Fractions equal to exactly half may be counted at 
full or dropped at the discretion of the investigator. Thus if 
accuracy is to be maintained tip to the nearest thousand r 
48,27,6111 will become 48,28,000 
48,27,481 will become 48.27,000 
48,27,500 will become either 48,27,000 or 48 , 28 , 000 . 
If accuracy is to be maintained up to the nearest hundred : 
48,27.681 will become 48,27,700 
48,27,481 will become 48,27,500. 

If accuracy is to be maintained to the nearest tenth of a 
unit 'nearest to first decimal place'! : 

15 213 will read as 15’2 
15*486 will read as 15 5. 

If accuracy is to 1* maintained to the nearest hundredth of 

a unit f nearest to two decimal places) : 

15 213 will become 15’2! 

15*486 will become 15 40. 

Xott'. — (1) Approximation is sometimes done by discarding 
the mimlier of digits not required and no consideration is 
made of the fraction which is more than half or less than half 
Thus : 

28/17,835 will become 28.97,000 
28/17,349 will become 28,97,000. 

(2j Approximation can also be done by counting the 
fraction, whether more nr less than half, as full, thus : 

28,97,835 will become 28,98,000 
28,97.349 will become 28,98,000. 



Chapter 5 
Collection of Data (ti) 


Primary sad Secondary Data 

T he term 'primary data' refer* to the statistical materia) which 
the investigator originate* for the purpose of the inquiry 
in hand, Thus if it is desired to conduct an inquiry into the 
coct of living of the worker* in a certain null, and if the facts 
pertaining to this inquiry are collected by the investigator or 
his representative* from the workers themselves such data 
would be termed as primary data. 

The term 'swmdaiy data* on the other hand refers to that 
statistical material which is not originated by the investigator 
himself but which he obtains from someone rise’s records. 
Thus if instead of obtaining our data from the urorkers them¬ 
selves we get them from the records of the Trade Union, or 
from some other source the data will be called secondary data. 

The difference between primary and secondary data is 
largely one of degree. Data which are primary in the hands 
of one person, may be secondary in the hands of another. 
Thus the data collected during census operations are primary 
to the census department of the Government of India, but to a 
person who makes use of these data for further research they 
wilt be termed ‘secondary’. 

On this distinction of primary and secondary data, tine 
methods of collecting the statistical material may be classi¬ 
fied as: 

1, Primary Methods, 

2. Secondary Methods, 

Primary Methods 

For obtaining the primary information the investigator 




COLLECTIOW Or DATA (ll) 37 

may adopt any one of the following methods: («} Direct 
personal observation, (it) Indirect personal interview*, (*u) 
Information from correspondents, (iv) Mailed questionnaires* 
and (j) Questionnaires to be filled by the enumerators. 

Direct personal observation, According to this method 
the investigator obtains the data by a personal observation 
(or interview) of the objects under study. This means that 
the investigator presents himself personally before the infor¬ 
mant and obtains a first-hand information. A good example 
of work along this line is that of Professor Zweig** investi¬ 
gation of the expenditures and living conditions of the 
labouring classes, by actually living for years with various 
families. 

This procedure of data collection is adopted when the 
field of inquiry is small and there is a desire for a greater 
accuracy, Due to the personal supervision and personal 
touch of the investigator that is available under this method 
the data collected are bound to be more accurate. But this 
method has the disadvantage of being expensive arid lengthy. 
This procedure of data collection, however, is used very 
seldom because of the fact that most of the statistical inquiries 
have a much wider field than is possible fur any one investi¬ 
gator to cover single-handed within a reasonable period of 
time. Thg success of this method depends largely, among 
other tilings, upon the personal qualities of the interviewer— 
his tact, diplomacy, t ourage, conviction, curiosity and cap¬ 
ability of understanding the psychological and instinctive 
reactions of those whom he interviews. In the words of King, 
‘This type of inquiry, while admirable because of additional 
accuracy due to personal supervision, must needs cover too 
narrow a field to be representative and is also liable to too 
large an injection of the personal clement. The prejudices 
and desires of the investigator become too often unconsciously 
woven into the fabric of his conclusions/ 

Indimt personal interviews. This method is adopted generally 
tti cases when the information desired is complex or there is 
reluctance or indifference on the part of the informants w*bcn 



38 AH INTRODUCTION TO STATISTICAL METHODS 


the study in hand is extensive, Under this method, instead 
of directly approaching the informants, the investigator inter¬ 
views several third persons, who are directly or indirectly 
In touch with the information sought. Such a procedure is 
followed by the inquiry committees and commissions appointed 
by the Government of India. The committee or the investiga¬ 
tor selects the persons (known as witness) and collects infor¬ 
mation from them by asking answers to questions, decided in 

advance. 

Success or failure of such a method depends, among other 
things, upon {•) the representative character of the witnesses 
settled for investigation ; (ii) the personal knowledge of the 
witnesses about the things asked ; (in) the prejudices and 
desires of the witnesses about supplying inhumation ; (i; ) the 
personal qualities of the witnesses as regards dcfmitenns m 
Stating what is wanted. 

Information from correspondents . Under this method there 
is no formal collection of data, but local agents or correspon¬ 
dents are appointed in the different parts of the area under 
investigation. These correspondents arc asked to collect 
information and transmit it to the investigator, They are 
advised to use their own judgment as to the best way of 
obtaining it. When the accuracy is not of prime importance 
and only approximate results are desired, this method is 
specially suited because of its being cheap and expeditious 
In cases where accuracy is desired, it is not a reliable 
method because the prejudices ami desires of the corres¬ 
pondents may consciously or unconsciously inlluemc the data 
collected 

Mm ltd questionnaire method. Under this method a schedule 
(or a questionnaire; containing a number of questions per¬ 
taining to the object under investigation, is prepared T hese 
schedules are sent by post to the informants, and arc to be 
returned bv the informants duly filled in. This proceduie 
is cheap and fairly expeditious, provided the informants arc 
intelligent enough to answer the questions correctly. By this 
method a large held of investigation may be studied at only 



COLLECTION Or DATA (ll) 


39 


a fraction 01 the expense needed to pay for sending out 
enumerators. But this method has some drawbacks. The 
informants may not send back the schedules, and even if they 
return the schedules they may be inaccurately tilled in. This 
method is generally adapted by private individuals. The 
success of this method depends upon : 

1. The ability and knowledge of the informants about the 
facts wanted ; 

2. The ability with which the ‘questionnaires’ are pre¬ 
pared ; and 

3. The favourable response from the informants. 

2t is claimed that this method saves time in obtaining 
information. But this is partly illusory‘because of the long 
delay incidental to the receipt of replies and the handling 
of the large follow-up correspondence occasioned by the frag¬ 
mentary and incorrect answers.’ 

Qjifitimmtira in charge of (numerators. The last method, the 
one that b generally employed in large investigations, is 
to delegate the task of interviewing informants to select 
agents. These agents are termed as ‘enumerators’. They 
may be employed either on a pay basis or they may olTer their 
services free. 

The enumerators who arc in charge of our population 
census questionnaires are honorary workers. These agents are 
provided with standardised questionnaires and specific training 
and instructions are given to them regarding the way in which 
the schedules are to be filled and the information elicited. 

For private investigation, this procedure is too expensive 
to be undertaken. This method has an advantage over 
the previous one inasmuch as schedules will be complete and 
comparatively accurate. The enumerator will sec that only 
relevant answers are received from the informants. By cross 
questioning, the enumerator will be able to get correct answers. 
The advantages of the agency method are, however, very 
largely nullified by cureless drafting of schedules, inadequate 
instructions to agents and selection of incompetent enumera¬ 
tors. 



40 AH INTRODUCTION TO STATISTICAL METHODS 

The enumerator must posses* certain qualities : 

] * Sufficient knowledge to comprehend and follow the 
instructions; 

2. Ability to distinguish between good and bad answers ; 

3, Tact in extracting information desired; 

4* Open-mindedness to record the facts as they appear ; 

3, Unprejudiced and unbiased attitude ; and 

6. Courtesy. 

A high degree of uniformity in the information collected 
will be possible only if the enumerators arc competent 
and well trained- Again, it is only competent enumerators 
who will be able to get correct answers to otherwise complex 
questions. v 

Drqfhng tkt quaiunoHjirc The reliability of statistical 
material obtained depends largely upon the correct ness with 
which the questions asked are answered, which ultimately 
depends upon the adequacy of the questionnaire, knowledge 
and desires of the informants and also of the enumerators, 
The preparation of a questionnaire is a highly specialised art. 
One cannot hastily draft the questions, have the form 
mimeographed and get it cither mailed out or handed over 
to the agents, and expect to obtain correct and reliable 
results. If the questionnaire is unduly long, or the questions 
are too complex, it is quite likely that it may be thrown into 
the wastepaper basket. 1 

The way in which the schedules are to Ire collected will 
also have an important bearing on the number, nature and 
form of the questions asked. If the schedules are to be 
filled by the enumerators, the questions may be more complex 
and more numerous than in the case when the schedules are 
directly mailed to the informants. The enumerators may 
explain to the informants'the nature and purpose of the inquiry, 
and also the questions asked. He may also clear up doubtful 
points and v aak corroborative questions. In the case when 
information is obtained by mailing the schedules, the questions 

1 H*«k, A. R.: SmtiUHt md Am A&iUttkm ft Qmmm 9 , p. 18. 



COLLECTION Of DATA (II) 41 

themselves must tarry conviction* be self-explanatory, comb- 
tent and persuasive. Personal appeal for information, best 
made by human contact, is then made through the printed 
rather than the spoken word.* 

In drafting the questionnaires the following points need 
careful attention of the draftsman : 

1. The co-operation of the informants must be obtained 
and the interest in the inquiry be created in them. Their 
co-operation may be secured by engaging the support of some 
association with which the recipients of the schedules are 
associated. Interest may be aroused by the promise to supply 
free copies of the fruits of inquiry or by the promise of secrecy 
regarding the schedules. Inducements such as gifts should not 
be offered- The best inducement h a brief and attractive 
questionnaire. 

2. Those questions should be avoided which are likely 
either {a) to arouse the resentment of the informant*, or (A) to 
offend or frighten the informants, or (r) to allow of evasive 
answers, or { d) to be answered in prejudice. Those questions 
must be excluded from the questionnaire whose answer* put 
the informant on the defensive or touch his pride or prestige. 

3. The individual questions must be brief and simple. 
Complicated and long-winded questions irritate and result in 
incorrect answers. The schedule should be as simple as possible 
because each additional question means additional expense and 
extra labour in classification and tabulation. 

4. The number of questions should be consistent with the 
scope of the information sought. 

5. The questions should be free from ambiguity, should 
not allow double interpretation, should be arranged systemati¬ 
cally and should not involve duplications. To quote Bowky, 
‘The questions must be so dear that misunderstanding is 
impossible and so framed that the answers wjjj^p^^fectly 
definite, such as simple number, or 'yes' or 


* Seeds t •}. ti$. t p. 66. 




42 AH INTRODUCTION TO STATISTICAL METHODS 


*uch as cannot give offence, or appear inquisitorial; or lead to 
partisan answers, or suppression of part of the facts.* 

6, The units of measurement should be clearly and pre¬ 
cisely defined. 

7, The forms must contain necessary instructions so that 
making mistake* be rendered difficult. The instructions must 
not be too complicated. 

8. The standard of accuracy required (whether the answers 
are to be correct to rupees or paise, to years or months) 
must be written in the schedule. 

9. The letter transmitting the questionnaire should be short 
and dignified and yet should ‘sell’ the ideas to the recipients. 

19. Once the questionnaire lias been drafted, it must be 
tried an a dozen or so of the potential informants in order to 
assess the probable reaction of the wider public to be covered 
later. 

Secondary Method* 

It is not always necessary to conduct special surveys for 
the purpose of obtaining statistical material. Such material 
may be obtained from the records of institutions that collect 
and publish statistics as a part of their routine duties. The 
most important routine compilers and suppliers of statistics 
arc governments. Next in importance come the semi-official 
institutions like municipalities, railways and state-owned 
undertakings In addition to these there arc trade associa¬ 
tions that publish statistical data, Statistical material also 
appear* in trade journals, market reports, magazines and other 
periodicals. With reference to the manner of its publication, 
the secondary data may be divided into three groups. 

h Cmfinmta at Hgular data. Statistical data published 
regularly at short, known intervals are called continuous or 
regular data. The examples are the weekly index number of 
whpksak prices, monthly figures of exports and imports, 
ftionthiy figures of workers involved in work stoppages, monthly 
ptritk* of the production of certain selected industries in 
"ilkti** and so on. 



43 


COLLECTION OF DAM (tl) 

2. Periodical data, which are regularly published at long 
intervals such as Indian Census, Statistical Abstract for India, 
Agricultural Statistics or India, Trade Statistics, am) statistical 
tables relating to banks in India, 

3. Irregular data , consisting of special studies of statistical 
phenomenon, with no regular dales of publication, e.g., the 
reports of the National Income Committee, Tariff Commis¬ 
sion reports, and the reports of the various committees and 
commissions appointed by the Government from time to time* 

Editing Primary Data 

The returned question nitric* (filled by the informants or by 
the enumerators) should be scrutinized at an early stage with 
a view to detect errors, omissions and inconsistencies. Before 
the collected information is tabulated, rat It individual schedule 
must be checked in detail to ascertain whether or not it has 
been answered in full and accurately. Defective schedules 
should be returned for amendment. Hut if the figures are 
manifestly erroneous, they may be tejected or corrected. In 
order that the schedules may he corrected by the investigator 
himself hr must have reasonable grounds upon which to 
act. The work of editing requires skill and scientific impartKi- 
hty of a very high degree. Bailey and Cummings name four 
types of editing : editing for consistency, uniformity, complete¬ 
ness and accuracy. 1 

In order to judge whether the relumed schedules are 
coniovtent, wc must compare the answers of those questions 
which were designed to be mutually confirmatory. If the 
answers of such two questions appear to be mutually contradic¬ 
tory, it is essential to determine which, if either, is correct. To 
obviate this difficulty, the most obv ious solution » to send hack 
the. schedule, though this procedure will not always yield a 
decisive result and will in any case consume considerable time 
and effort. An alternative method is to discard such sche¬ 
dules entirely, but again this is not desirable as we would have 

1 Quoted by Crum, Patton and Tcbbult : /ntrodiutwn to &»twmk Ststiuut» 

j>. 57. 



44 AH IWTH0017CT1ON TO STATISTICAL METHODS 

la sacrifice some important items of information bearing upon 
the problem in hand. 

The returns may be edited to avoid lack of uniformity. 
Evert in primary collections, there will be an occasional return 
in which the answers are submitted in the wrong manner. 
Per hups the most general mistake of this sort is in the statement 
of units : time may be expressed tn years rather than months. 
Such mistakes can be remedied easily and uniformity may be 
attained. 

On the other hand there are some such mistakes which can 
only be remedied by further correspondence. Instances of lire 
mistakes of this type are the stating of figures for wholesale 
and retail trade when information is sought for retail trade 
only, the reporting of figures for a calendar year when it is 
sought for the fiscal y ear, etc. 

Editing for completeness is the main and a straightforward 
operation. Certain space often appears on the schedule for 
such things as totals, ratios, etc., which the informant was 
not asked to fill in. These spaces are to be filled by the investi¬ 
gator himself. If, however, the return is incomplete, in the 
sense that the informant has not answered a particular ques¬ 
tion, then further correspondence is essential. 

Editing for accuracy is the difficult task of the editor. 
Inaccuracies, other than inconsistencies are seldom apparent 
from internal evidence in the return. Only experience can 
help an investigator to detect such errors. 

The editor should bear in mind that he must never destroy, 
by dipping or erasure, the original reply and the corrected 
items should be entered separately. The process of editing 
is by no means an unimportant and routine operation ; rather 
it requires marked ability, scrupulous care and a rigid adhe¬ 
rence to scientific objectivity. 1 

Ed i t in g Secondary Data 

Statistical material obtained from secondary sources is 
t Crum, Tattoo sad Tebbutt t p. 5& 



45 


COLLECTION OF DATA (it) 

not always as reliable as that from the primary source. 
'Statistics, especially other people's statistics, are full of pitfalls 
for the user. Terms may be used in peculiar senses ; mean* 
ings may have imperceptibly changed ; external factors 
may operate to produce discrepancies/ 1 As such it is neces¬ 
sary to scrutinize the secondary data in the light of the follow¬ 
ing points ; 

1. The type and purpose of the institution that publishes 
statistics as a routine. 

2. The purpose for which the data are issued and the 
consumers to whom they are addressed. The purpose may be 
ft) general or specific, (tV) restrictive or inclusive, (in) transient 
or permanent, and (ip) scientific or unscientific. 

3. The nature of the data themselves. Are the data 
biased ? Bias may he due to (t) wilfully eliminating parts of 
the facts, it) basing comparisons on inadequate data, and 
im) relating them to unrepresentative periods or conditions, 
The next question that should be asked is : ‘Are the data 
samples only or complete enumeration ?’ 

4. In what types of units arc the data expressed ? Are 
they die same at different times, at different places, and for 
all rases at the same time or place ? 

5. Are the data accurate ? 

6. Do the data refer to homogeneous rm ditions ? 

7. Art the data germane to the problem under study ? 


1 Connor : , W#., pp. 10-11. 



Chapter 6 

Sampling and the Concept of Error 


I n planning a statistical inquiry it is to hr determined as to 
whether the investigation is to take into account the whole 
‘population’ or only a pan of it. When it is concerned with 
the whole population (he., when complete enumeration is 
undertaken) it is called the 'Census Enumeration’, and when 
only a part of the population is taken into account it is called 
the ‘Sample Enumeration*. 

The selection of a part of an aggregate {sampling) to repre¬ 
sent the whole aggregate is a long established pranice,. A 
handful of grain taken from a heap has since long hern com¬ 
monly taken to represent ihe quality of the entire heap The 
fart that the characteristics of ihe sample are able to provide 
an approximately correct idea about the characteristic' of the 
population is based upon the operation of the Law of Statis¬ 
tical Regularity and the Law of Large Numbers. 

Law of Stadsticsl Regularity 

This law has been derived from the mathematical theory of 
probability. It enunciates that a group of objects selected At 
random from, a larger group (referred to as population s tends 
to possess the characteristics of the larger group. Now this 
law will operate only if the objects have been chosen-at random. 
Dv choosing at random is meant a method of selection in w hich 
'every item has got an equally likely chance’ of being included 
in the sample. Again, the principle of statistical regularity is 
only a tendency. This means that the results given by the 
sample are likely to be only approximately (and not exactly) 
the same ai the results given by the population. This means 
that if a number of samples are taken from a particular popu¬ 
lation, the results will vary from sample to sample on the one 




SAMPLING AND WE® CONCEPT OP EltEOR 47 

hand and from the population on the other. But the difference 
will he small provided the samples are reasonably large* It 
is because of the operation of this law that we are able to obtain 
a sufficiently accurate idea of the population without following 
the costly method of complete enumeration. 

Law of Inertia of Large Numbers 

This is a corollary of the Law of Statistical Regularity and 
lays down that "large aggregates are more stable than small 
ones 1 . This means that if the numbers involved are large the 
total change is likely to be very small, Thit is due to the fact 
that the movements of an aggregate are the resultants of the 
movements of its separate parts. All parts do not move in the 
same direction at the same time ; and if one part moves in one 
direction some other is likely to move in quite an opposite 
direction. This means that the movement of one part will be 
compensated by the movement of some other part. Rut this 
will happen only if the group is large. Thus a study of wheat 
production might reveal substantial variation from year to year 
as regards a particular village but the total yield of wheat in 
India for those very years may show only a slight variation. 
Again, with the help of this principle it is usually possible to 
predict, with reasonable accuracy, the relative frequency of 
occurrence of particular event by calculating a certain mathe¬ 
matical probability. To illustrate, the chance of get ting a head 
or a tail in a toss of an unbiased coin is equally likely. But 
when we actually throw a coin we will either get a head nr a 
tail. This means that the actual result will iwr very much 
different from the expected one. If, however, the coin is tossed 
a l OKI times the actual frequency of head or tad will be very 
much nearer the probable .or expected' frequency. 

Census versus Sample Enumeration 

‘In the method of complete enumeration (census since all 
items constituting the population have been observed, informa¬ 
tion is available for each separate part of the universe. It is 
not so in the sample method. Here (as has Irern already ex- 



48 An vrtmmicrwn to statistical methods? 

plained) only a pari of the universe is observed and from the 
results given by the sample estimates for the populat ion are 
inferred. It, therefore, follows that whenever overall results of 
the whole population art required sample method would admi¬ 
rably suit the purpose. But if detailed results of the different 
parts of the population are needed sampling would fail to give 
the required information and census method will have to hr 
adopted. 

Another factor that should be taken into account in choos¬ 
ing the method of enumeration is the relative difficulty and cost 
of organising a sample and a complete enumeration. Sample 
method needs better organisation than the census method. In 
fact in some cases of census enumeration the work may be done 
by an already existing government machine and no organisation 
of any importance may l«r necessary . The cost per unit, there¬ 
fore, is greater for a sample than for a complete census. But if 
the sice of the sample represents only a small fraction of the 
whole population the expense in money and effort will fir much 
less in a sample than in the census method, 

Sample method has another advantage over the census 
method, if the information is collected from only a small 
proportion of the population, its completeness and accuracy 
can be easily ensured. When the numbers involved are few, 
more attention can be given to each such member and by 
persistent effort complete information can be obtained, fins, 
obviously* is not possible when numbers are large. Again flic 
information supplied by each individual can be carefully scrub- 
turn! if the numbers are few and in case of any doubt inquiries 
can be undertaken for verification. In a sample method it is 
possible to collect more detailed information as an individual 
would be more willing to supply information in detail if hr knows 
that he represents a small sample of the whole population* 

Sample method requires comparatively a much less time 
both for the collection of data and their analysts. Time, as has 
been already pointed out elsewhere, is an important factor in 
statistical investigation and as such sample methods have a 
superiority over the method of complete enumeration. 



sampling and run concept ur emk>» 49 


Again, in iveitigation of a sociological type where informa¬ 
tion is to be collected from persons many of whom are illiterate, 
or in investigation which requires skilled physical observation 
and measurements, really qualified enumerators are needed* 
In a sample method few such enumerators would be required 
and hence it would be possible to undertake a sample enumera¬ 
tion by a qualified staff rather than a complete enumeration by 
untrained and Jess capable enumerators. 1 

Sampling Methods 

Broadly speaking, there are two methods of selecting a 
sample : Deliberate Selection and Random Selection, 

Deliberate Selection. Under this method the investigator 
exerdsei his discretion in the matter of selecting the items that 
are to be included in the sample. He picks up only those items 
w hich he thinks are typical or representative of the population. 
A sample selected in this manner will l>e representative if the 
selection is judicious and is not influenced consciously or other¬ 
wise by the personal prejudices of the investigator. Since it is 
difficult for an ordinary man to take a completely detached 
view of the subject under study and to cast of hi* personal 
prejudices this method has nothing to recommed itself. 

Random Selection. To eliminate the possibility of human 
prejudices interfering in the selection of a representative sample 
the method of random selection has been devised, Under this 
method chance alone is allowed to determine which items from 
the population are to be selected. This is done by numbering 

1 The advantage* of sampling method have been summed up by Pro¬ 
fessor ft. A. Fisher as follows ; 

d have made four claims for the sampling procedure. About the first 
three, adaptability, speed and economy, I need say nothing further. Too 
many example* are already available to show how much the new method 
ha* to five in these ways. But, why do I say that it i* more scientific than 
the only procedure with which it may sometimes be in competition, the 
complete enumeration ? Thr answer, in my view, is in the primary process 
of designing and planning an inquiry by sampling. Roofed as it is in mathe¬ 
matical theory of the errors of random sampling, the idea of precision »t 
from the first in the forefront. The director of survey plans from the first 
a predetermined and known level of precision ; it is a consideration of 
vk. *ch he never loses sight, and the precision actually attained, subject to 
welt understood precaution*, it manifest from the result* of the inquiry/ 



SO Alf imHODUCTIOPr TO 9TATI0TfCA& MITHOD8 

aU the member* of the population, writing each number on a 
slip of paper, mixing these slip* thoroughly in drum, and 
making a blindfold selection of the desired number of slip*. A 
•ample drawn in the above manner is known a* a random 
•ample for this method ensures independently to every tndivi*' 
dual an equal chance of being one of those who are selected 
for the sample. In practice, it is rarely practicable to follow 
a procedure of this type unless the population is fairly small. 

Systematic Sampling or Quasi-random Sampling 

Although the principle of random selection is considered to 
be the best, much practical sampling is not fully random in 
selection. Thus when a complete list of the population is 
available a common method of selecting a sample is to take 
every nth item from this list. This method is called systematic 
or quasi-random sampling. Thus if the list of the population 
contains 20,000 items and a sample of 500 items is to be taken, 
the selection of every 40th item will give the required sample. 
The first entry is determined by selecting a number at random 
between 1 and 40. Thus if the first item obtained in this 
manner is fifteenth, fifty-fifth and ninety-fifth will be the 
second and third in the sample and in this manner all items 
will be picked up. This method is based on the assumption 
that a complete list of the population under study is available. 
In order to obtain good results from this method it is necessary 
that the list is arranged wholly at random. 

Stratified Sampling 

The sampling technique is based upon a fundamental 
assumption that the population to be sampled is homogeneous. 
Often the population is not homogeneous. There may be clear- 
cut groups within the population which art radically different 
as regards the characteristics under study. The economic 
conditions of the poor and the rich are bound to be dissimilar. 
This views of adolescents and grow n ups may be divergent as 
regards a social problem. When the population is markedly 
heterogeneous it is first subdivided into groups or "strata** in 



8AMPUNC AND TBS COWCEPT OP EBSOK 51 

such a mmirocr that ail items in any particular group are 
similar with regard to the characteristic under comidcrattoa. 
From each such 'strata' items are chosen at random. The 
number of items taken from each group may be in pro¬ 
portion to its relative strength. The sample so formed is 
called stratified. 

Multistage Sampling 

In multistage samp Hog the population is distributed into a 
number of first-stage sampling units, and a sample it taken of 
these first-stage units by some suitable method. This is the 
first stage of sampling process. Each of these (selected) first- 
sample units is further subdivided into second-stage units, and 
from these again a sample is taken by some suitable method. 
Fort tier stage* may be added if required. 

Thus if a sodo-economic survey Is to be organised in a 
stair, die first stage of the sampling process would be to divide 
the area into regions and select in a suitable manner a sample 
of these regions. The selected regions will again be divided 
into smaller sampling units (say villages) and from out of these 
will be selected the sample units that are to serve as sample for 
the purpose of the investigation. 

Statistical Error 

In statistical writings the word 'error' is used in a very 
restricted sense, and denotes the difference between estimation 
(as obtained by the investigator) and the actual size of the 
object under investigation. Inaccuracy due to arithmetic 
miscalculation is not an 'error* but simply a ‘mistake*. 

Svurcfi cf error. Errors* arise from a variety of causes. The 
important ones are the follow ing : 

«) It is not possible to attain mathematical precision in 
measuring such variables as height, weight, distances, due 
to the limitations of the measuring instruments. Height may 
be measured correct to the tenth of a centimetre, or distances 
may be measured correct to the nearest metre. This means 
that there is always some difference between the measurement 



$2 Apt nrmoDucTioN to statistical methods 

obtained and the actual s ht. Again* the informants, due to 
one reason or the other, may give incorrect answers. Errors 
might also arise due to the incompetence of the investigator 
or the selection of an unsuitable statistical unit. All such 
errors are known as Enm of Origin* 

(n) When a sample inquiry is conducted the results given 
by the sample will not be representative of the population if 
the sample chosen it unduly smalt or the items included in it 
are not selected in a proper manner. It is true to say that a 
sample 4 can seldom if ever be a precise ofb print of the whole.’ 
But if it is properly chosen the difference called sample error 
can lie easily estimated. Errors which are due to small samples 
or insufficient coverage are known as Errors of fnadequatv . 

(in) Then there may tie errors unconsciously rornrrvued 
by enumerators in measuring the object. Such errors and 
errors due to approximation are known as Errou of \fani~ 
potation. 

Claim of errors. Errors may be classified a* : 

(i) Biased or Cumulative errors, 

(si) Unbiased or Compensatory errors. 

Bi*$*4 or cumulative errors. Errors that are due to a bias 
on the part of the enumerator, nr in the technique of approxi¬ 
mation or in the measuring rod are tailed biased error*. An 
error that is due to defective measuring rod or to any of the 
other two causes enumerated above will always he in one 
direction, and hence will be cumulative. This can be illus¬ 
trated by an example. Let us suppose a doth dealer has a 
metre that is one centimetre short. Now every metre of doth 
that is measured would be short by a centimetre, i.e., the error 
is in one direction, the total error increases by a centimetre 
every time a metre of doth is measured. Thus if 30 metres of 
doth are measured, the total error would be 30 centimetres. 

Approximation might also lead to bial if, while rounding 
off, each individual figure is reduced or increased, Thtis if 
each item is reduced to the lower thousand the error is on one 
side and goes on increasing as more and more items are 
rounded, Exactly the same kind of error will arise if the 



SAMPLING AND THE CONCEPT OP ERHOH S3 

informant is prejudiced or has defective judgment Thus if 
women arc asked to state their age most of them will understate 
it by an year or so. 

Unbiawl vr comptmaiary errors. If the estimate is likely to 
err on either side, he,, if the chances of making an over¬ 
estimate or an under-estimate are almost equal, the errors are 
called unbiased. Such errors tend to eliminate each other 
and as such the ultimate result is not very much affected. Thus 
if figures are rounded to the nearest thousand some approxi¬ 
mations arc greater and some smaller tlian the original figures. 
If a large number of such approximations are given their total 
wilt ru>t be very much different from the total of the original 
figures. Similarly, if a cloth merchant whose intentions are 
honest and whose metre is absolutely correct makes haste in 
measuring doth the chances are that some of the measures will 
be more and some of others will lie less with the result that the 
measured length will not be very much different from what it 
was intended to be. 

Mcamrmtnt of etran. Errors may be measured either 
absolutely or relatively. The. arithmetic difference between 
the estimated figures and the actual figures is called the 
absolute error. Thus if the actual figure is 74,182 and the 
estimated figure is 74,000, absolute error — 74,000—74,182 
*>■ 182 , 

The relative error is obtained by expressing the absolute 
error as a fraction of the estimated total Thus relative error 

182 

in the above case is - 0025. When the relative 

/ *T ,Ul/V 

er ror is expressed by wav of a percentage it is known as percen¬ 
tage error. In the above case the percentage error is *25. 



54 aw jutboductiom to statistical methods 


EXERCISES 

I. 'Siatulicai Methods include all those devices of analysis and synthesis 
by fueaits of which statistic* are scientifically collected and used to 
explain or describe phenomena either in individual or related capa¬ 
cities,’ (Sm i*l). Elucidate the statement. 

% Explain clearly what you understand of statistics, Discuss its scope, 

S. In what ways can statistical methods be misused by interested persons ? 
Give at leiMf two examples of the misuse of statistics. 

4. Comment on the following definition* of statistics ; 

(a) 'Statistics is the science of counting/ \BewUy) 

(4) 'Statistics is the science of estimates and probabilities/(JW4tagftm) 
{() 'Ey statistics we mean quantitative data affected to a marked 
extent by a multiplicity of causes/ (JWr) 

(4) ‘Statistics it the science of the measurement of the social organism, 
regarded as a whole, in all its manifestations/ 

(#) 'Statistics i* that branch of scientific methods, which deals with 
the numerical aspects of aggregates of natural phenomena/ 

(/) '.Statistics is the science of averages/ 

5. What do you mean by statistical methods ? 

6. Trace the gradual development of the science of statistics. 

7. 'Statistics was originally concerned with matters of State and was 
regarded as the Science of State Craft/ 

Show, in the light of the above remarks, how statistics too Id have been 
of use and necessity to State in ancient times. Is it of any utility today 
also ? 

8. Discus* the scope, utility and limitations of statistics. 

St. Describe briefly the different kinds of statistical method* and show the 
usefulness of (He knowledge of statistics to business executives, econo¬ 
mists, scientists and social reformer*. 

10. 'Sciences without statistics bear no fruit, autistic* without science* 
have no root/ Explain, 

II, What i* a statistical inquiry ? Describe the main stages in a statistical 
inquiry. 

12. What pi chmmray steps should be taken into consideration while plan¬ 
ning a statistical inquiry ? 

13, A cotton manufacturer m Delhi k anxious to tap new markets for his 

S o Otis in India and foreign countries. What statistical materials should 
e collect ? 

14. What are statistical units of measurements ? Explain the various 
kinds of units. What precautions should be observed in specifying a 
unit ? 

16. What do you mean by primary data and secondary data ? 

Ifi. 'll is nevrt safe to take, published statistics at their face value without 
knowing their meaning and limitations/ 

Emphasise the practical importance of the above dictum. What general 
rules would you lav dow n for a proper study and critic km of given 
statistic» ? 

17. What are “the various methods of collecting primary data ? Give exam¬ 
ples of each with relative merits and demerits. 



SAMPLING AND THE CONCEPT OF EEROM 55 


18. Compare the merit* and demerit* of *Cen*tt* Method* and ‘Sample 
Method 1 of collecting statistic*. 

18. What u a 'Questionnaire' ? How doe* it differ from a Wank form t 
What precaution* should be taken in drafting a questionnaire f 

20. Write brief notes on ; 

U> taw of Statistical Regularity. 

(!) Law of Inertia of Large Number*. 

(f) Biased and Unbiased Error. 

(d) Percentage Error 

21. Give example* to ihow that : 

(•) Biased error* are cumulative and unbiased one* are compensating, 

ik) Relative accuracy is more important to a statistician than abaolutc 
accuracy. 

(c) It it conventional accuracy of measurement that is generally 
looked for. 



Chapter 7 

Classification and Tabulation 


T he data collected for the purpose of a statistical inquiry some¬ 
times consist of a few fairly simple figures which can be 
easily understood without mny kind of special treatment. But 
more often there is an overwhelming mass of raw material and 
detail without any form or structure. Data obtained from 
primary sources obviously enough arc in a raw state for they 
have not gone through any statistical treatment. Rut the 
secondary data too are no better in this respect inasmuch as 
the form in which they are obtained is more often than not 
unsuited for the purpose of the inquiry. 

This unwieldy, unorganised and shapeless mass of 
collected data is not capable of being rapidly or easily 
assimilated or interpreted. At best only a hazy impression 
and that too of a doubtful reliability may be obtained by its 
ptruiaL Consider the marks listed in table 7.1, obtained 
by the students of a class. Without reorganising these data, 
a cursory survey yields a general impression that most of 
the students obtained marks in-between 45 and 55. A more 
laborious examination shows the lowe st marks to be 14 anrt^ 
the highest 74, so that the range may be stated as 74 — 14 ^60. 
Beyond this it is difficult to go with the data in their present 
form. 

In order to make the data easily understandable, the 
first task of the statistician is to condense ami simplify them 
in such a manner that irrelevant details arc eliminated and 
their significant features stand out prominently, The proce¬ 
dure that is adopted for this purpose is know n as the method 
of classification and tabulation. 

Qiislfiation 

(To quote Cannot, Classification is the process of 



CLASSIFICATION AND TABULATION 


5 ? 


arranging things (cither actually or nationally) in groups or 
classes according to their resemblances and affinities, and gives 
expression to the unity of attributes that may subsist amongst 
a diversity of individuals.’ - ) 

The objects 0 /classification may be stated as follows : 

1. To eliminate unnecessary details ; 

2. To bring out clearly points of similarity and dissimi- 
larky ; 

3. To enable one to form mental picture* of objects of 
perfection and conception ; ami 

4. To enable one to make comparisons ami draw inferences. 

Buns r/ classification. Statistical facts are classified according 

to iheit characteristics or attributes. Thus the students of a 
college may be classified according to their marital status, 
height, religion, etc. When a particular characteristic has been 
chosen tot this purpose the next step in the process of classi¬ 
fication would be to note the similarity and dissimilarity as 
regards this chosen characteristic in the various items. Items 
that would be alike in respect of this characteristic will be 
grouped together. Thus if the students arc to be classified 
according to their marital status all married students would be 
pul into one group and all unman led in another. If the stu¬ 
dents are classified on the basis of religion there wilt be different 
groups for Hindus, Muslims, Sikhs, etc. When the classification 
is made according to heights, each group will include only 
those students whose height lies within a certain range. It w ill 
be noted that the three different characteristics (marital status, 
height and religion) give us groups that are significantly 
different from one another. Thus in the first case we have 
groups where the characteristic is present or absent, c.g,, 
married or unmarried ; in the second we have groups where 
the characteristic is of differing quality, e g., students may be 
Hindus, Muslims or Sikhs. In the third illustration we have 
groups where the characteristic is present in different degrees, 
e.g., there w ill be a group of students, whose height is between 
150 cm. and 158 cm., another group whose height is between 
158 cm. and 165 cm., and so on. 



$8 AW tNTftOUtICriOW TO STATISTICAL METHODS 

Characteristics may be broadly divided into two categories : 
{*) Qualitative, and (ri) Quantitative. 

(i) QjidiU Uitv characteristics are those which are not capable 
of being described numerically, e g,, sea, nationality, colour of 
the eye, etc. These characteristics arc called ‘attributes’ or 
‘attributive variates* or ‘descriptive characteristics*. When 
classification is to be made on the basis of attributes, groups are 
differcm in ted either by the presence and absence of the attribute 
(e.g,, Muslims and Non-Muslims) or by its differing qualities. 
The qualities of an attribute can easily be differentiated by 
means of some natural or physical line of demarcation, and their 
natural differences determine the group into which is particular 
item is to be placed. Thus if wt select ‘colour of the eye* as 
the basis of classification there will be a group of‘brown eyed* 
people, another of‘blue eyed* people, and so on. If the data 
are classified on the basis of one attribute only the process is 
termed as simple classification. In cases, where more than one 
attribute is studied, resulting in a subdivision of classes, the 
classification is known as manifold. Thus the population of a 
city may be divided into literate and illiterate. Literate persons 
may again be divided into literate males and literate females. 
The following illustration depicts an example of manifold 
classification : 


Hindus 


{ Married 
Unmarried 


And all these 
may be further 
subdivided into 


Population 


< 


Muslims 


f Married [ . 
\ Unmarried f 


2 . 


Adults 

Adolescents 


^ Sikhs 


{ 


Married 
Unmarried J 


3. Children 


(ii) Qjumtititiivt characteristics arc those that can be numeri* 
cally described, such as height, weight, turnover, exports, test 
scores, etc. These are called ‘quantitative variates' or simply 
Variables’. {‘Thus by a variable is meant a quantity which 



CLASSIFICATION AND TABULATION 59 

assumes different values that may be measured in some 
approximate unit/) 

Such characteristics can be classified only by assigning them 
arbitrary limits. Thus the marks obtained by the students 
may be classified : (i) distinction marks, (it) first class marks, 
(iii) second class marks, («) third class marks, and (p) fail 
marks, etc. 

Statistical Series 

A ser ies is a sysiematic arrangemetit of items. When 
statistical data have been arranged in a systematic manner, it 
is called a statistical series. The arrangement may be {») accord- 
ing to magnitude, i.c., the items may be arranged in an ascend¬ 
ing or descending manner ; (u) according to time, i.e,, items 
may be arranged according to time of the occurrence ; or 
(mi) according to their geographical location. When the 
arrangement is on the basis of magnitude the series takes the 
form of a frequency distr ibution. If the arrangement is accord¬ 
ing to time or space the series so obtained are respectively 
called time and spatial series. 

Continuous and Discrete Series 

A variable may be (i) continuous, or (it) discrete. If a 
variable can take any numerical value within a certain range it 
is called a continuous variable. Such, for example, arc heights, 
weights, rainfall records, barometric readings,etc. In measuring 
the height of university students it is possible to come across any 
measure (say) between 167 cm. and 170 cm. ; there may, for 
example, be a student whose hieght is 167 12034 cm, The point, 
is only that any conceivable height within this range can occur* 
If we prepare a list of all persons in India whose height is bet¬ 
ween 167 cm. and 170 cm. it is quite possible that for every 
height (say 167*001 cm., 167*002 cm. and so on) there will be 
a person available who has it. Thus a variable is said to be 
con*inuous w hen *it may pass from one value to th e next by _ 
infinit ely small gradation/ If a man says he has driven 50 
kilometres, we understand that the distance is approximately 50 



60: AJV MTHODLCTJON TO STATISTICAL METHODS 

kilometre*. In order to increase hi* distance from 50 to 51, he 
must pass through all the infinitely small gradations of distance 
between the two. Distances cannot increase at a leap by a whole 
unit discretely and must increase continuously. Series in which 
the variables are continuous, are known as continuous series. 

Variables which lake only discrete or exact values are 
called discrete* eg,, test scores, family members, etc. 'There 
cannot be test scores in fraction and the members in a family 
will always be integers, he,, I, 2, 3 ; 4 . and so on. Thus a_ 

variable IjiBidJ^hc^UlCCiCtt.J^eiljhere are gaps between 

one valucatuiilicritxt J. A count of the number of the students 
in a class room mav yield, say 60 . If one more student enters, 
the count leaps from 60 to 61 without passing through any inter¬ 
mediate fractional values. At no time is it *slighdv under h<V <>r 
*a little more than 60*. I hr count is exact, and proceeds by a 
series of leaps from integer to integer, Series in which vat iablrs 
are discrete are known as discrete or discontinuous series 

Time, Spatial or Condition Scrim 

At this stage it is also necessary to distinguish between time 
series and the series w here ‘time* factor does not enter. Set ics 
which give measurements of the change in a vai table at different 
periods are called time series If we give thr figures of annual 
turnover of a firm for the last (say) ten ytais, the series that will 
be formed by these figures will be called Time Series, The 
following is an illustration of time series : 

Figures showing to turn production in India in thousand 
bales of 400 lb. net during 1942-47 : 


Year 


Total of all Staples 

1912*43 

. , 

4,702 

43-41 

, . 

5,259 

44-45 


. , 3,560 

45-46 


3,530 

46-47 


3,531 



CLASSIFICATION AND TABULATION 


61 


If, on the other hand, a aerie* h comprised of the changes 
in the value of a variable from individual to individual at the 
same period of time and under the same genera) conditions, 
it may be a spatial or a condition series. It differs from the 
time series in the sense that litre ‘lime is not a factor of any 
material importance*. If the wages of the workers at a 
certain time in the same trade for different localities are 
given, it will be an example of n series where time factor 
dors not enter. As an illustration of spatial series we give 
below the distribution of area under cotton, variety-wise, in 
1951-52 : 


Variety 

Area 

(thousand acres) 

Ben gals 

987 

America ns , . 

878 

Own r as 

3.929 

Broach . . 

775 

Surti , . 

268 

Dholleras , . 

1,266 

Others 

4,180 


Now, the problems arising in the analysis of time series 
are different from those involved in the organisation and 
analysis of series where the time factor does not enter. While 
studying scries of the former kind our main objective is to 
measure and analyse the chronological variations in the si/e 
of a variable. We may, for example, study the changes in the 
wholesale prices over a period of years, or the variation in the 
turnover of a firm over a period of year*. 

The problem in the organisation of series where time is not 
a factor, is to determine the number of times each value of a 
variable is repealed and how these values are distributed. The 
problems of analysis of these two kinds of series are thus 
different and as such there are different methods that are used 
with regard to each one of them. 

In this chapter we will discuss the organisation of the data 
where time element is not present, 







62 A!4 IflraODUCntW TO statistical methods 

The Array 

The first thing that is to be done in the matter of arranging 
the collected data it- to prepare an ‘Array*. If time is an 
important factor in the collected data, the array is prepared 
according to time, i e , the values of the variable arc arranged 
in a chronological order If, for example, we have the data of 
the turnover of a business for the last five years the array will 
be as follows : 

—----K7 


Year ending March 1949 ... ... 20,350 

u „ „ 1950 . 25.21ft 

„ „ 1951 ... ... 19,317 

„ tt „ 1952 . 26,2 IB 

M ‘ M 1953 ... ... 23.190 


The array that we obtain is called time series. But if in the 
collected data time is not an important factor, the array is 
prepared by arranging the values of the variable in an ascend* 
ing or descending order. This will enable us to know the 
range over which the items are spread and we will also get 
an idea of their general distribution. Table 7.1 gives the data in 
their original form. When arranged it takes the form of table 7.2, 
TABLE 7.1 

Data as Originally Collected 

Mirki in ‘Statistics' obtained by 105 Students of a College 


40 

57 

61 

1>7 

56 

70 

39 

46 

63 

41 

60 

38 

39 

40 

51 

37 

40 

72 

39 

50 

43 

41 

25 

42 

38 

40 

50 

30 

33 

54 

58 

14 

7! 

55 

33 

65 

55 

66 

40 

62 

54 

39 

43 

55 

38 

40 

20 

69 

41 

43 

49 

59 

73 

28 

47 

59 

33 

33 

52 

68 

38 

48 

45 

71 

44 

52 

45 

56 

64 


51 

59 

47 

46 

57 

65 


39 

73 

28 

30 

46 

54 


56 

44 

14 

62 

47 

52 


50 

50 

50 

23 

41 

74 


53 

47 

52 

61 

42 

28 


40 

56 

63 

56 

66 

30 









63 


CtASSrFICATICm AND TABULATION 
TABLE 7,2 

Data Arranged in Order of Size 

Mark* In ‘Statiitic*’ obtained by 105 Studenu of a 
College 


14 

38 

42 

50 

56 

64 

14 

38 

43 

50 

56 

65 

20 

39 

43 

50 

56 

65 

23 

39 

44 

50 

56 

66 

23 

39 

44 

50 

57 

66 

28 

39 

45 

51 

38 

67 

28 

39 

45 

51 

58 

68 

28 

40 

46 

52 

59 

68 

30 

40 

46 

52 

59 

69 

30 

40 

46 

52 

59 

70 

$0 

40 

47 

52 

59 

71 

33 

40 

47 

54 

60 

71 

33 

40 

47 

54 

61 

72 

33 

40 

47 

54 

61 

73 

37 

41 

45 

55 

62 

73 

37 

41 

48 

55 

62 

74 

38 

41 

49 

55 

63 


30 

41 





38 

42 






An inspection of table 7.2 provides a much clearer impress 


*ion of the distribution of marks than could be obtained from 
the unorganised marks in the original compilation. The 
highest and lowest marks are immediately seen and the middle 
marks, or mid-marks, can be readily obtained by counting 
from either end. The marks which occur most frequently 
are readily identified as 40 (securing 7 times), and 38, 39, 
and 50 {appearing 5 times each). No other marks arc as 
common as these. 

The tabulation s iown in table 7,2, however, is rather 
difficult to prepare, cumbersome to print, and not particularly 
effective in its presentation. After arranging the data at in 
table 7.2 their bulk must be reduced so that the ‘eye can take 
them in easily, the mind comprehend them, and computational 
method deal with thorn efficiently'. A first step in such a 
condensation would be achieved by representing the repetitions 
of a particular mark by tallies instead of rewriting the 'marks* 





64 AH INTHODtrcrifW TO STATISTICAL methods 


itself, -as in tabic 7.3. The number of tallies corresponding 
to any fcw enmarks is the frequency of th at ‘m arks' p ‘usually 
represente d by the letter/'. The traditional method of tallying 
U to record the frequencies by rnaiki until four have been 
made, then to make a cross mark for fifth score. This 
procedure makes tip the preliminary sheet, (.SVr iahU 7 1 bthtt). 

TABLE 7.3 


FreUmlfury sheet showing the marts obtained by 105 students 


Marks 

Tallies 

Frequency 

Marks 

Tatties 

Frequency 

H 

a 

2 

45 

II 

2 

15 



46 

III 

3 

16 



47 

ixn 

4 

17 



48 

11 

2 

1H 



49 

1 

1 

15 



50 

nn 

5 

20 

1 

1 

51 

a 

2 

21 



52 

in 1 

4 

22 



53 



23 

in 

1 

1 

54 

in 

3 




55 

in 

3 

25 

1 

l 




26 



56 

1 hi 

4 

27 



57 

1 

I 

28 

m 

3 

58 

a 

2 

29 



Ml 

nn 

4 

30 

ui 

3 

60 

1 

1 

31 



61 

11 

2 

32 



62 

n 

2 

33 

in 

3 

63 

1 

l 

34 



64 

1 

1 

35 



65 

n 

2 

36 



66 

n 

2 

37 

n 

2 

67 

1 

1 

38 

mi 

5 

m 

it 

2 

39 

THI 

5 

a 

1 

I 

40 

1HI 

II 7 

70 

1 

1 

41 

till 

4 

71 

n 

2 

42 

II 

2 

72 

1 

1 

43 

n 

2 

73 

11 

2 

44 

it 

2 

74 

1 

1 






classification Ann tabulation 65 

By breaking down the data into the form of table 7*3 
much more information becomes apparent, A superficial 
scrutiny of table 7.3 reveals that marks between 40 and 60 
occur largest number of times. Such a tablets known as a 
frequency distribution. It is so called because it shows the 
frequency or the number of times each individual figure 
occurs. 

Even after so much simplifications, the data contain too 
many figures. Their bulk can Iw* further reduced by cons* 
tructing a frequency table in the following manner: 

TABLE 7.4 

Frequency Distribution Showing the Marks Obtained by 101 
Students 


Marks Tallies Frequency 


6 to 15 

n 




2 

16 to 25 

m 




3 

26 to 35 

HU 

IIII 



9 

36 to 46 

nu 

mi mi 

mi 

TUi 

mi i at 

46 to 55 

t HI 

mi mi 

mi 

nu 

II 27 

56 to 63 

mi 

mi mi 

THl 


20 

66 to 75 

THt 

mi m 



13 


Total 105 


The data in the above form are sometimes described a* 
'grouped' frequency distribution. The Range {difference 
between lowest and highest marks' is subdivided into smaller 
compartments or groups, usually termed ‘classes’. In the above 
illustration each class comprises of ten murks. Thus the first 
class, 6 to 15, includes all those boys whose marks are 6 or 15, 
or anywhere between them. 6 and 15 are respectively termed 
as lower and upper limits of the first class When items hav i ng 
sir.es equal to the lower limi t as well a s the upper limit of a 
class are included i n the frequency of that very class {as hi~~ 
table 7.4}, the classes arc kn own as 'in<^isTif* tV}»en, However, 
the items equal to the iwe of e7Hier^en^rr^Thrii! ' 







66 At* INTBODUCTION TO STATISTICAL METHODS 


upper limit are excluded from the frequency of that class it is 

Imow^ai 5e xe i u»i ffi " " .^ 

^ The data given in table 7.3 will take the following shape if 
the classes are exclusive : 


TABLE 7,5 


Marks 

(lower limit exclusive) 

Frequency 

5 to 15 

2 

15 to 25 

25 to 35 

3 

9 

35 to 45 

31 

45 to 35 

27 

55 to 65 

20 

65 to 75 

13 


Total 105 

In the above frequency distribution the lower limit is exclu¬ 
ded, i.e,, the items having a stxe equal to the lower limit of a 
class are included in the next lower class ; whereas those equal 

to the upper limit of a 

class are included in that class itself. 

Alternately, we may also exclude the tipper limit of the class. 

Under this arrangement 

the data would appear as : 


TABLE 7.6 

Marks 

(upper limit exclusive; 

Frequency 

5—15 

2 

15 25 

2 

25-35 

10 

35—45 

29 

45-5:5 

26 

55-65 

21 

65.75 

15 


Total 105 








CLASSIFICATION AND TABULATION 67 

The same data can alao be presented in the form of cumifc . 
lativt frequency d atribution. Hie cumulation may be upward 
or downward. "Both' type* of cumulated data are *hown 
below : 


TABLE 7.7 


Mark* 

frtqatacy 

Exceeding or above or 


more than 5 

105 

15 

103 

25 

100 

„ 35 

91 

„ 45 

60 

55 

33 

65 

13 

75 

0 


TABLE 7.8 


Marks 

Frequency 

Less than 5 

0 

r t 

15 

2 

IT 

25 

4 

»> 

35 

14 

»* 

45 

43 

*1 

55 

69 

if 

65 

90 

M 

75 

105 


The Selection of Clone* 


The quality of a frequency distribution is ultimately deter¬ 
mined by a wise choice of the number and the width of the 
classes. No hard and fast rule can Ih? given for determining 
the number of classes. The problem of deciding the number 
t« directly connected with the second problem, via,, the problem 







68 aw intboiwction to statistical methods 

of fixing the tixe or width of the classes. The width oLa xlan 
M termed m *chy interval^, (Thu* in the above example claw 
Interval tsTfT) In the selection of classes the following points 
need attention : 

I. The number of classes should seldom be lest than 10 or 
more than 20, and 15 generally is a good number. If we divide 
the range of the series by IS, then the resultant quotient (round¬ 
ed off to the nearest integer) will provide a helpful suggestion 
as to the size of class interval. 

2- Intervals in multiples of 5 are convenient and facilitate 
computation. As far as possible, class intervals should be 
uniform in width. Unequal dais intervals should be avoided. 

3. In general, an interval with an odd number of units 
is easier to work with than one with an even number. The 
class interval with odd number of units has the advantage of 
having an integer as its mid-point, whereas the class interval 
with art even number of units has a fraction as its mid-point. 
That is why odd numbers are preferred. 

4. As far as possible open-end classes should bt avoided. 

5. Class limits should be unambiguous and mutually 
exclusive. 

6. Lastly, the starting point should be adjusted in the 
light of any natural groupings of the data, so as to avoid 
distortion. This is necessary because in the interpretation of a 
frequency table and in subsequent calculations based upon it, 
the mid-point of each class is taken to represent the value of ail 
items that constitute the frequency of that class. ‘Thus an 
arrangement by which several high frequencies are placed at 
the very bottom of an interval with no compensating high 
frequencies at the top, or titt ttrsa, would cause a slight error 
in the outcome of computations.'* 1 

Methods of Designating Claes Interval 

There are a variety of ways in which class interval may be 
designated : 

t Walker, it. M. : Fitmtnttir? AfafMr. 



CLASSIFICATION AND TABULATION 69 



(«> 

(it) 

(tit) 


(») 




M, Pts. 








15 

13- 17 

13- 


- 17 




20 

ia -22 

18- 


-22 




25 

23 - 27 

23 - 


-27 




30 

28 - 32 

28 - 


- 32 




35 

33 - 37 

33 - 


- 37 




40 

38 • 42 

38- 


-42 



t 


{. it 

' 

iii) 



ini, 


5 

and less 

than 10 Above 5 and up to 10 

- 

w 

10 

i » n 

„ »5 

10 

W 

15 

50 - 100 

15 

i* i* 

.. 20 

15 

it * )» 

)) 

20 

100 - 150 

20 

»» 1 1 

„ 25 

» 20 „ 

i» 

25 

150-200 

23 


30 

25 „ 

*♦ 

30 

200 - 250 

30 

it »* 

35 

_ Li _ 

M 30 , ? 

JL. 

35 

250- 



Frequency Distribution : Continuous Variable 

The problem of constructing a frequency table of marks 
scored by student* is simplified by the fact that marks arc 
invariably expressed only in integral values. When, however, 
the data refer to a continuous variable, they are generally 
measured to the nearest fraction of a whole unit. Height of 
individuals, for instance, may be measured to the tenth of an 
inch Again, the range of these measures may be so small that 
in order to get sufficient discrimination between the measure* 
in the frequency distribution class interval of a fraction of the 
unit must be used. For example the height* of the students 
of a class may be spread over a range of only two inches and 
shall have to be classified into intervals of one-quarter of an 
inch. 

When the measures to be classified have been determined 
to the nearest multiple of a given fraction of unit (say, 
one-tenth of an inch, or sixteenth of a seer) the following rules 
should be adopted for preparing a frequency table : 

t. Determine the range and divide it by the number 
of classes desired* The size of the interval should be made 







70 AW IWTBODUCTIOW TO STATISTICAL METHODS 

equal to that convenient multiple of the given fraction 
which it nearest this quotient. Tbit rule can be under* 
stood by the following illustration. Suppose the height of 
students of a class has been measured to the nearest tenth 
of an inch, and that the range of the distribution is 6 8 inches. 
One-fifteenth of this would be 455 inch. This is rounded off 
to the nearest tenth of an inch, or to *5 inch. T his value 
would be used as the clan interval. The interval should be, as 
far as possible, an odd multiple of the given fraction. 

2. The lower and the upper litnitt of the class interval 
should be multiples of the given fraction. Tire mid-point 
should) as far as possible, be a multiple of the class interval. 
If the height of the shortest individual is 64 3 inches and of 
the tallest is 71 !, the mid-points would be 64*5,65 0,65*5, 
66*0, 66 5 and so on up to 710 and the classes will be 64*3— 
64'7,64 8—65*2 and so on up to 70 S— 71*2. 

Mid-point of a Class Interval and the Determination of 
Real Limit 

The determination of the mid-point of a class interval is 
necessary for the computation of a number of statistical cons¬ 
tants. The position of the mid-point is determined by real 
as distinguished from apparent class limits. 

If a frequency table records the distribution of a discrete 
variable the real and the apparent class limits are the same 
(unless the class interval is abusive). This is due to the fact 
that discrete data art always expressed in whole numbers and 
are always characterised by gaps at which no measure may 
ever be found. Thus if the claw intervals of discrete variable 
are 

6—10 
II—15 
10—20 

the apparent limits 6 and ID, 11 and 15, 16 and 20 are real 
limits also. The class 6 -10 includes only those items whose 
sties are 6, 7, 8, 9 or 10. Any item whose sixers more than 10* 
i,e<, II, 12, etc., or less than 6, le. # 5, is not included in this 



CLASSIFICATION AND TABULATION 71 

class, but in the next higher or the next lower class. There 
are of course no values between 10 and 11, or 13 and 16, In 
such a case, therefore, the mid*point is the middle of the five 
values included in a class, viz., 8 in 6—10 class, IS in 11—15 
class and 18 in 16—20 class. 

If, however, the class interval is exclusive, the apparent 
limits are not real and before finding the mid-point real limits 
should be determined. If the class interval is given in the 
following manner it is said to be exclusive class interval : 
O') 5—10 

£«} 10 — 15 

(in) 15 — 20 

This means that an item having a value J5 is to be included 
either in class in) or class (iii). If it is included in class (u) it 
means value 10 is included in class (i), Hence the real limits ni 
class {it; are 11 — 15 and their mid-point 13. If 15 is not in¬ 
cluded in class in) but in class {iii) the real limits of class (if) 
are 10 »-H and the mid point u 12. It, therefore, follows that 
whenever we have an exclusive class interval wc must decide 
as to which limit of the class is excluded and it is only then 
that the mid-point should be ascertained. 

Continuous Series 

Any characteristic in which items may differ by 
insignificant amounts if proper measuring instruments are 
employed is regarded as a continuous variable. Heights of 
school children, for example, may be measured in as fine 
units as wc please and between a certain range there is n< 
point along the scale of heights at which wc may not find the 
height of some student. 

Though theoretically such variables can be measured to an 
infinitesimal fraction of a unit, the measures that are obtained 
are only approximations to absolute accuracy. While measuring 
the weight of buys, for example, wr seldom go to a unit smaller 
than the pound. Thus when we say that the weight of an 
individual is 140 pounds what we really mean is that his 



72 AN INTRODUCTION TO STATISTICAL METHODS 

weight is nearer 140 pounds than 139 or 141 pounds. This 
means that it is somewhere between 139*5 and 140‘5 pounds. 

From this it follows that if in any frequency distribution of 
weights we find a class interval identified by the interval limit 
(say, 140 — 144} we must conclude (i) that weights have been 
measured correct to the nearest pound, and (ii) hence the real 
limits of the interval extend by '3 pounds on either side and the 
class interval, strictly speaking, is 139 5—144 5. The mid-point 
of this class is to be determined from these limits. The method 
of finding the mid-value in this case is as follows ; 

, »■ • r .i « , Upper limit — Lower limit 

Lower lunit of the class 4* — 2 ...-. 


• 139-3-f"! -I39-5+2-5,. H2-0 


or 


Upper limit -lower limit 


139*5 r 144 5 

. .. 2 “ 


2H4 
2 . 


112 0 . 


If the weight has been measured correct to the nearest tenth 
of a pound we will have class intervals like the following : 
140—144*9 
145-149 9. 

On the basis of what has been said earlier the real limits are ; 

13995-! 44-95 
144*95—149*95. 

Here the mid-point will be 


m 93 4 14195 


-14245, i.c., 142 3. 


Tabulation 


A statistical table is a logical and systematic organisation of 
dat a ttvccdumdv and i o n t a t: t 15 bulai lonh as been termed 

as *mcdianlcai pa rt of jjto^at tewV Its main object ia to 


t Smith and Dwitean. 



CLASSIFICATION JUKI* TABULATION 73 

so arrange the physical presentation of quantitative facts that 
there may be no misinterpretation of their significance. 
‘Tabulation involves the orderly and systematic presentation 
of numerical data in a form designed to elucidate the problem 
under consideration. 11 

Statistical data arranged in tables have some definite 
advantages over those descriptively stated.* {1) Tabulated 
data can be more easily understood and leave a lasting imores - 
jnoiu han those which are not tabulated. (2) Tabulated data 
facilitate quic k compariso ns. (3) A tabular arrangement 
makes easier the summatio n of items and detection of er rors 
jind omissions. (4) A tabular arrangement makes it u nn eces ¬ 
sary to repeat explanatory phrases and headings. 

The above*mentioned objects and advantages of tabulation 
can be attained only if certain rules have been observed in 
their construction. 

To prepare a first class table, one must have a clear idea 
of the facts to be presented, the contrast to be stressed, the 
points upon which emphasis is to be placed, and lastly, a 
familiarity with the technique of preparation/ 1 * 3 The facts, 
contrasts and emphasis vary from table to table, but the 
technique has a general application. In preparing a table 
attention should be given to the following points / 

1. hilt. A good title explains in brief and concise 
language 

(«) what the data are, 

(h) where the data arc, 

(c) the classification principle, and 
{t!} time period of the data. 

It is usually advisable to prepare the table first and write 
the title afterwards. 


1 Omnor. 

* Srcitsl. 

3 Harryjamroc ; StuUstual Mttkcdt. 
t See fteitwangrr. 




74 AW INTRODUCTION TO STATISTICAL METHODS 


2. Captions amt stubs, The titles of the columns are called 

those of the row* f siubs\ The wording of the 
heading should be brief. The bos over the stub on the left of 
the table should give description of the stub contents. The 
headingsfor columns should be in singular form, as : year, 
Country, price, output The most important positions in the 
table are the left-hand column and top row, and the advantage 
of position decreases gradually as we proceed to the right and 
towards the bottom. In time series the dates should be arran¬ 
ged left to right with the early years at the left ; or if the 
years appear in the stub, from top to bottom with the early 
years at the top. When it is desired to place the current items 
in a more prominent position a reverse arrangement may be 
followed. In order to facilitate understanding groups of five 
or more years should be separated by a space. 

3. Totals, sub-totals and Mirages. A statistical table must 
contain sub-totals for each separate class of data and a giand 
total for all combined classes. 

4. Statistical units . Unit designations must be written at 
the top of the columns. If the last t hree figures of a whole 
number arc omitted from the figures as given in the table, the. 
following may be regarded as a suitable form : 

,000 acres ,000 rupees ,000 tons ,000 bales 

5. Footnotes and definitions. It may happen that without 
footnotes the data may seem to tell a >tory which is quite 
different from the actual facts. Thus if we look into a table 
giving yearly figures of wheat production in India, the sudden 
fall in the figure for 1947 would be misleading unless there is 
a footnote to point out that the figures for 1917 relate to India 
after partition. So whenever needed, footnotes and definitions 
should lie noted at the bottom of the tabic. 

6 Soane, A sixth requirement is about the source of the 
data. It must be mentioned in the footnotes. 

7. Simplicity In order that the table may lender maxi¬ 
mum service, the goal of simplicity should be constantly k*pl 
in mind. 



CLASSIFICATION ANI) TABULATION 75 

A Form of Table 


Tin.* 

(De*cription of Units and Year, Place, etc.) 


(Stub box) i 

{/1 5 Caption 

{B) Caption 

D j 

! ~0) I 

rw 

. {3) s yr 

Stub X 

; j 



Y 








Totals 

■ . . 




Not*;,—A ny definition. 


Any explanation. 

Source from which derived. 

J he scheme show n above gives an abstraction of the mecha¬ 
nics of tabulation. It shows the jx>sition of the title and descrip¬ 
tion of units above the table and for illustration designates four 
columns numbered (1), (2) t (3) and (4) and three rows, 
lettered X, f, 

The four columns are sub-columns : (1) and (2) are sub- 
columns of column (d), and (3) and (4) of column {B). The 
box of caption headings would appear in the spaces designated 
(/f), (//) respectively and sub-caption headings would appear 
in the spaces designated (I , (2), (3) and (4). Similarly the 
three rows are described by stub heading X t T and Z- The 
space (O) is for the general description of the stub headings. 
The position of totals and footnotes, etc., is also shown in the 
table. 1 

Gcatrsl Parpose sad Special Purpose Tables 

/a general purpose tabic is one which is drawn up in the 
ordinary course to show ih* characteristics of a wide range 


1 Smnh ami Duncan 






76 Aft INTRODUCTION TO STATISTICAL METHODS 


of item values, Tables appearing in Government of India 
publications are general purpose table*.} 

(When a table is to be read—is to narrate a story—it is 
called a special purpose table. J It should be simple otherwise 
it will defeat its purpose. The object of special purpose table 
is to compress into a small space a lot of information the 
narration of which in the teat would be cumbersome and 
exhausting to the reader. It is, in short, a method of 
condensation, and it is of the utmost importance that, as 
it tells so much in so small a compass, it tells it as clearly as 
practicable. 1 


Original and Derivative Tables 

{original tables present the information in substantially the 
same form as it was assembled, while derivative tables present 
such information which has passed through the statistical 
machine at least once after its collection.} Thus it implies some 
process of manipulation, such as grouping, totalling, averaging 
or other operations of a mathematical nature. 


Simple and Complex Tablet 

table wherein only one characteristic is mentioned, is 
called a 'simple table’. If two characteristics arc mentioned 
the table is termed as ‘double tabulation'. If more than two 
attributes or variables are studied, it is referred to as 'complex 
tabulation'\\ It would be dear from tables 7.9 and 7.10. 


* K. 1*. Falkoer, quoted by Smith and Duncan. 



Displaced Persons by Years of Arrival 

(M stands for Males and FIor Females) 


CLASSIFICATION AND TABULATION 77 







TABLE 7 10 

Towns Arranged Territorially wills Population 

(At stands for Mates and F for Fa 


78 


INTRODUCTION TO STATISTICAL METHODS 



< 3 ) Ahtna la bad 
b) Dhofkha 
Banda Dist. 







n 


CLASSIFICATION AND TABULATION 

EXERCISES 

1, Describe the consideration* which are to guide you in fixing the range, 
the class iniorval and upper and lower limit* of cbm intervih fw * 
frequency distribution. 

7, Differentiate between the data which are spread over a time, and the 
<bu which ert#« at a po»m of time. 

1. Discus* the object* of dAwifkation of a raw mas* of collected data* 

l, Distinguish between 

(a) Continuous srrie* and discrete series. 

Exclusive and inclusive dan intervals. 

<«rl Ordinary cumulative frequencies. 

\4) More than and lest than frequency table*. 

■ r'■ Historical and non •historical series. 

5 What w * statistical table ? What are it* objects and importance in a 
general scheme of any statistical inquiry ? 

6 Describe what comideeat tons are re* guide you in constructing a statist 6- 
ral table. 

7- Prepare a blank table to give as much information as possible of the 
uimmaiv teawlti of the dntnbiition of population according to sea and 
huif t riigiom, at hve age groups in different state* of India. 


The 

(ot low big 

1 * a re,- ord of 

wright* of 

70 

student* 

in tbs. 

Tabulate 

data 
6ft, 1 . 

in the fo 

fm r*l fvccju#»m 

"v dtsltihutt'in 

taking 

the lowest 

Has* as 

M 

r\ 

91 It-.7 

U2 

76 

7fi 

m 

96 

72 

«< f 


0«i |(Mt 

101 

04 

84 

i06 

91 

75 

9! 

u: f 

W? oi 

10 1 

00 

77 

105 

90 

86 

m 

un 

IH 12 

77 

lift 

O’) 

61 

99 

02 


lOfi 

ir ,»q 

02 

10" 

m 

76 

83 

ac 

ua* 

S 07 

02 <M 

7.1 

10ft 

ns 

as 

m 

91 

169 

07 

:i r m 

07 

R2 

104 

no 

m 

92 


H A., Trerint&n, 1 95#! J!. 1 ] 

r h The following are thr wages in r 11 pres of 70 workers and you are asked 
u ' tabulate the same by gm.tpmg them bv intervals of 10. 


V2 

47 

57 

67 

62 

92 

117 

«7 

97 

102 

91 

61 

71 

ftl 

121 

10ft 

ft 5 

61 

9ft 

111 

r.« 

61 

71? 

9ft 

ill 

9ft 

12 ft 

lift 

68 

73 

92 

«2 

til? 

57 

^ "7 

ft2 

72 

92 

52 

12 

16 

46 

■11 

ft*'.’ 

nr, 

146 

96 

m 

-Hi 

26 

U4 

69 

79 

1M 

129 

24 

m 

99 

94 

04 

rs 

120 

1 15 

10 

:r. 

125 

105 

15 

75 

45 

n 21 



80 an iirnwatfcriow to statistical methods 


10. Ttwf follow in j rn like record of mtrki obtained by 90 candidate* in an 
examination. Form a frequency (Attribution. 


ft4 


5ft 

72 

4 ft 

87 

70 

45 

83 

40 

73 

86 

77 

75 

73 

71 

M 

46 

35 

43 

33 

76 

95 

65 

74 

50 

65 

m 

57 

n 

36 

33 

ftl 

53 

63 

6ft 

47 

2ft 

37 

11 

m 

40 

21 

84 

34 

19 

35 

72 

44 

19 

51 

67 

58 

76 

5ft 

w 

37 

74 

46 

50 

18 

39 

27 

ft? 

13 

45 

m 

86 

m 

7ft 

21$ 

12 

71 

(a 

22 

41 

38 

27 

m 

31 

29 

83 

4? 

30 

1ft 

22 

33 

3ft 

80 

37 



(0. <4,, M&dmi % t!)’>$) (I 3J 


II. Following are tb* mark# obtained by 24 undent* in a Has* test of 
Statistics and law. Represent the data by one frequency table. 


No. of Students 

1 

2 

3 

4 

5 

c 

7 

8 

9 

10 

li 

12 

Marks in Statistics 

15 

0 

1 

3 

16 

2 

ia 

5 

4 

17 

6 

1ft 

,4 h Law 

13 

1 

2 

7 

8 

ft 

12 

ft 

17 

16 

6 

*8 

No. of Students 

13 

14 

15 

16 

17 

18 

ift 

20 

21 

22 

73 

24 

Marks in Statistic* 

14 

ft 

fl 

n 

10 

13 

n 

11 

12 

18 

ft 

7 

t , „ Daw 

11 

3 

5 

4 

10 

11 

14 

j 

18 

13 

15 

1 


D' f >J 

12. Draft a form of tabulation to show ia) arts, {4: three rank*-■super¬ 
visor i, assist* m» and clerk*, ft) years- 1918 and 1943, :4\ age group- 
lft year* and under, over 18 but less than 55 year#, over 33 year*, 

ill. A. t fyyit 

131, Prepare a blank table, showing the distribution of *tudeim of a college 
according to age, rta*» and rcwdrwrr for arranging (*} physical uatn- 
mg, and \b) tutorial rlatsev 

tt Cam., Agra, / ( 1 . 7 ; 


14. Point out the mistakes made in the follow*’Og blank table drawn to 
show the distribution of population according to *ck, age, civil condi¬ 
tion* and literacy. 


S' 

i 

£ 

tn 

e* 

e : 

50.75 

75 and above 

i’ll 1 1 

; l 


4 1 § ? 1 'E 

i ; | 

TS s 

4 j | ' jj i s I 

'£ i * 

S 1 c 

-2 ? c 

2 

J* £ 

1 ■; -6 f ^ j 


iJ 1 

TsmsT ‘nr. m» : 

ItTf M f' 

1ST"" vf.’F." 

lit r rates 

Illiterate* f 

| 

! 

i 



Rwnnitntd the above table. 


15, Arrange, in a «i it able tabular hum the following r 

The food train* inquiry committee make* the following comparative 
study of site of holding in the Pastern r P with the rest of l*J\ : 





CLASSIFICATION AND TABULATION 81 

In the 14 eastern districts of Uf. holdings below t acres account 
for 20 % of the area under aO holdings comprising the total area of 
12,280 (thomand acm>; the corresponding figures for the rest of U.F. 
are 11% and 20,056 (thousand acre*). Similarly, the proportion of 
area covered by holding* exceeding 2 acre* but not exceeding 5 acre* 
to the area under all Holding* it 29% in 14 districts and only f% In the 
rest of U.F. On the other Hind, the proportion *of area covered by 
bidding* exceeding 5 acres t« much greater in the real of U.P. than in 
the 14 district*. 

(A#. A &»., 0#Mi, ipjd) [1.91 
In a newspaper account, describing the incidence of mftuenaa among 
tubercular per sons living in the same family, the following paragraph 
appeared : 

‘Exactly a fifth of the 100,000 inhabitants showed signs of tuber* 
winds and no fewer than 3,000 among them had an attack of influe ora, 
but among them only 1,000 lived in the infected houses, In contrast 
with this 1/15th of the tuberculosis persons who did not have influent* 
were still exposed to infection. Altogether 21,000 were attacked by 
influrtita and 41,000 were exposed to mk of infection, but the number 
who having influenza but not tuberculosis lived in bouses where no 
other caws of influenza occurred wa» only 2 , 000 / 

Redraft thr information in a concise and elegant tabular form, 

{Ms A , 7)r/ki, # 957 ) (1.10) 



Chapter 8 

Diagrammatic Representation 


4 At the eye k the ben judge of proportion, being able to estimate it 
with more qutefcnrw and arxuiacy than any other of our organs, it follows 
that wherever relative qualities are in question, a gradual increase or 
doemaae of any. ..value is to he stated, this mode { Diagrammatic ) of repre¬ 
senting i» peculiarly applicable ; it gives a simple, accurate and permanent 
idea* by giving form and shape to a number of separate ideas, which are 
otherwise abstract and unconnected. 1 — Will tern Playfair 

F igures are nut always interesting, and as their sue and 
number increases they become confusing and uninteresting 
to such an extent that no one (unless he is specially interested) 
would care to study them. Their study is a great strain upon 
the mind without, in most cases, any scientific result. The aim 
of statistical methods, inter alia, is to reduce the size of statis¬ 
tical data and to render them easily intelligible. To attain 
this object the methods of classification, tabulation, averages, 
percentages, and index numbers are generally used. But the 
method of diagrammatic representation (visual aids) is pro¬ 
bably simpler and more easily understandable. It consists in 
presenting statistical material in geometric figure*, curves, 
maps apd pictures. The various forms that are so obtained 
arc called statist teal diagrams 

Diagrams have greater attraction and memorizing value 
than mere figures. They give delight to the eye, add a spark 
of interest, and as such catch the attention as much as the 
figures dispel it. If the population of the various states of 
India is given to us in the form of a table it would entail some 
mental concentration to find out the most thickly populated 
states and to judge the variation in population as between 
different states. But if the population is represented by means 
of diagrams—-say, straight lines (one line representing the 
population of one state and so on)—drawn according to a 
certain scale, the required information can be derived in less 




0UGUMIUTIC KEPftSSElfTATXOIY 83 

time and without any mental fatigue, and the impression on 
the mind shall he of a lasting character. Such is the utility of 
diagrams and hence the growing use of this device in many 
popular books, magazines, and in newspaper advertisements. 

The technique of diagrammatic representation is made 
use of only for purposes of comparison. It is not to be used 
when comparison is either not possible or is not necessary. If 
there is only an isolated figure the question of comparison does 
not arise and hence there is no sense in representing it dia- 
grammatically. Similarly the numerical facts that are not 
related to each other and, therefore, possess no common 
characteristic, should not be represented by diagrams. If we 
know the average height of an Indian, the average earning of 
an Englishman, the size of the family of an American, we will 
not be able to represent these numerical facts with the help of 
a diagram since there is nothing common between them. But 
if we are given the weekly earnings of workers in the different 
textile mills in the country the use of diagrams will facilitate 
the work of comparison. 

The method of diagrammatic reptesention is not altcrna* 
tive to tabulation. It is merely an aid to throw into relief and 
clarify the conclusions that have been obtained by a process of 
tabulation. Thus classification and tabulation precede dia¬ 
grammatic representation. It may, however, be stated that 
charts and diagrams only strengthen the textual exposition of 
a subject ; they seldom serve as a complete substitute for 
statistical analysis. 

Rales for drawing diagrams 

J. The first and the most important thing is the selection 
of a proper scale. No definite rules can be laid down as 
regards the selection of scale. But it may be said for the 
guidance of students that the diagram should neither be too 
large not* should it be so small that it may look clumsy or 
indistinct. All the significant characteristics of the figures 
should be clearly exhibited by the diagram, and it should also 
suit the size of the paper. 



84 AN INTRODUCTION TO STATOmCAL METHODS 

2, The vertical and horizontal scales should be clearly 
shown on the diagram itself. The former on the left hand 
side and the latter at the bottom of the diagram. 

3. Neatness should be strictly observed and the diagram 
be drawn with the aid of geometrical instruments. 

4* The heading should be written on the top in bold 
letters and should be very explanatory. 

5. Various shades of colours can be used to make it more 
attractive and to bring into prominence the main features of 
the data that are to be represented diagrammatically. 

AM kinds of statistical scries can be represented by diagrams 
or charts. The technique of presenting time series and fre¬ 
quency series in charts is called ^graphical representation* and 
is discussed in a subsequent chapter. The method of present¬ 
ing categorical series in charts and diagrams is generally 
termed as diagrammatic representation', and n discussed in 
the following pages. 

Charting : Categorical Series 

For the interpretation of categorical data, tabular sum¬ 
marisation is not effective. Variations among the items are 
generally so extensive that arithmetic summarisation leads to 
results which are difficult to interpret. Charting, however, 
has a great importance in the* presentation and discussion of 
categorical data. 

Different diagrammatic forms are used for the representa¬ 
tion of statistical data. The selection of the one to be used it* 
a particular case depends mainly upon the characteristic of 
the data and the object of study. In fact the selection of the 
form is far more difficult than its actual drawing when the 
form has been selected. 

Representation through diagrams may be made in several 
ways. The following are some of the important methods : 
h Bars, 

2. Rectangles and squares, and 

3, Circles. 



DIAGRAMMATIC REFERS KRTATION 85 

Besides these, thiw-dimeowotuil diagrams, vis,, cubes, 
cylinders and blocks are also used for diagrammatic representa¬ 
tion. Pictograms, via., statistical maps and pictures, are also 
seen in statistical returns. Such devices, however, are hardly 
better than rectangular representation. Complications in 
calculations and difficulties in construction often block the 
popularity of three-dimensional diagrams. Pictures also are not 
so popular. They hardly bring out small differences to light. 
One cannot make comparisons between two pictures so easily 
as one can do between the length of lines and ban. Again, 
pictures cannot be safely used whenever and wherever desired. 

It should, however, be noted that there is no objection 
to the use of cubes, blocks and pictures. As a matter of fact 
they also represent the results as others do, but not so precisely 
and accurately. Bars, rectangle and squares are easiest to 
draw and they can be more precise and accurate in both 
calculation and construction. Hence a detailed study of them 
is required. 

Bar Diagrams 

The term ‘bar’ is used lor a thick wide line. The width 
of the bar is shown merely to make the diagram look more 
attractive, beautiful and explanatory. Bar diagram is the 
easiest and most adaptable general purpose chart. Though 
this type of chart can be used for any type of series, it is 
especially satisfactory for categorical scries. The bar* may 
be vertical as well as horizontal. No rigid rules can be cited 
for preferring vertical or horizontal position. Whichever plan 
is used, in a siglc study all the bars arc required to be of the 
same width separated by equal distance but resting on one 
and the same line which is called the 'base*. Colour, lines, 
or dots may also be used in the bars but the colour used for 
lining or dotting should be uniform in a single study. 



86 AN INTHODUCTI0N TO STATISTICAL METHODS 
The dan given below are reprciented by a bar diagram. 


TABLE 8.1 

Birth rata of a few coon tries of the world daring 
the year 1994 


Country 

Birth Rate 

India 

33 

Germany 

16 

Irish Free State 

20 

Soviet Russia 

40 

New Zealand 

30 

Sweden 

15 



Fig. ft, 1—Showing Birth Rate of a few Countries of the World 
During 1934 

A careful study of the above diagram will reveal the 
following points : 

Tht Iqymtt* I he data should be laid on the grid in such 






DIAGRAMMATIC REPRESENTATION 87 

a manna and on inch a scale, so as to show most effectively 
the variable under consideration. As far as possible the 
diagram must be entered on the grid. 

The scale. The scale caption, or design a lion, must be 
written on the top of the diagram. 

Ruling the grid. Grid marks must be introduced so that the 
eye may better evaluate and compare the height of the columns. 

Arrangement of columns. The bars should be arranged from 
left to right (or from top to bottom in the case of horizontal 
position) in order of their magnitude. Space between the bars 
gives a pleating effect. 

A bar may be 

(a) a simple bar, 

lb) a sub-divided bar, 

( c ) a percentage bar, 

(d) a bilateral bar, 
it) a split bar. 

Simple Bar 

A simple bar is a one-dimensional diagram in which the 
bar represents the whole of a magnitude. Figure 8.1 shows 
the birth rate of a few countries of the world during the year 
1934. In this figure the bars are in a vertical position. 

Sub-divided Bar 

The bar diagram may also be uaed to exhibit the divisions 
of a whole into its component parts. If a given magnitude can 
be split up into sub-divisions, or if there are different quantities 
forming the sub-divisions of the totals, simple bars may be 
tub-divided in the ratio of the various sub-divisions to exhibit 
the relationship of the parts to the whole. 

Table 8.2 gives the weekly milk consumption per head 
under different groups of income in 1936 in England. Fig, 8.2 
represents the milk consumption —different ban represent the 
total quantity of milk consumed. The shaded portion indicates 
the condensed milk and the remaining blank is the fresh milk 
consumed. Thus the bars are divided into two parts, viz., 
fresh milk and condensed milk. 



88 AJt UmOODCTlOH TO STATISTICAL methods 


TABLE 8.2 

VVMMy milk couwaMiM per bead fat England daring 
um year ifgf 


Income per head 
per week 

Fresh Milk 
lb 

Condensed Milk 
lb 

upto lo (hilling* 

10 (hilling* to 15 

15 20 

20 „ 30 

30 ..40 

above 40 (hilling* 

...lo.. 

20 

2‘4 

2*8 

4.0 

50 

0-6 

05 

05 

0.4 

0 3 

0'2 


( B. Com,, Agra, 19461 



Fig. 8.2—Showing Weekly Milk Concumption in England 
during 1936 







diagrammatic representation 89 

The above i» a common type of simple subdivided bar 
diagram. At A- glance, we note the following points t 

(i) the consumption of condensed milk is less than fresh 
milk in si! classes : 

(«) the consumption of condensed milk decreases slightly 
as income increases; 

(tit) the changes in the total consumption of milk are 
dominated by the changes in consumption of fresh milk. 

The data of table 8.2 ma y also be represented in another 
from* which is gaining popularity these days. This is shown 
in figure 8,3. In this two ban are shown side by side, one 
(shaded) representing fresh milk and the other representing 
condensed milk. 



Fig. 8.3—Showing Weekly Milk Consumption in England 
during 1936 

Table 8.3 gives the birth rate and death rate of different 
countries. Bars have been first drawn to represent births 
(fig. 8.4), and from these bars portions equal to death rates have 
been left white to distinguish from the remaining (shaded) 
portions which show the survival rate. 






90 AH IWTHODUCTIOPI TO STATISTICAL METHODS 
TABLE 8.3 

Birth Rftt« mad Death Ernie at m few Countries of the 
World during the year 1931 


Country 

B/R 

D/R 

India 

33 

24 

Germany 

16 

11 

Irish Free State 

20 

14 

Soviet Russia 

40 

18 

New Zealand 

10 

6 

Sweden 

15 

12 


{B. Corn., Luck now 1 f<jj8) 



Fig. 8.4—Showing Birth and Death Rates 
It should be noted that in fig. 8/2 the bars have been drawn 
equal to the total amount of milk consumed and out of these 
portions equal to the condensed milk have been cut. In fig. 
8,4 bars are made equal to birth rate (and not equal to the 
tola! of birth and death) and out of births, deaths have been 
cut. The data given in table 8,3 can be represented in the 
following manner also : 









DIAGRAMMATIC REPRESENTATION 91 



Fig. 8.5—Showing Birth and Death Rates 


Percentage Bar 

The distribution of an aggregate into its parts may also be 
effected upon a percentage basis. In such a case a single bar 
may be sub-divided into two or more than two components. 

Him If alum : 

The following table gives the details of monthly expenditure 
of two families A and B. Show this by means of percentage 
bars. 


TABLE HA 


Items of Expenditure 

Family A 
(Income Rs. 500} 

Family ll 
(Income Rs, 800) 

Food 

140 

240 

2. Clothing 

80 

m 

3. House Rent 

100 

120 

4, Education 

30 

so 

5, Fuel and Lighting 

40 

40 

6. Miscellaneous 

40 

80 








92 AM INTItODUCTlON TO STATISTICAL METHODS 
Stiutum: 


Tssa 

Clothing 
House Rent 
Education 
Fuel and Lighting 
Mifcellancou* 
Saving 


E 

Ml 

Family A \ 

Family B 

t> 

» 

3 

c 

V 

£ 

& 

la 

If 

11 

Income 

1 

g 

1 

Cumulative 

percentage 

140 

28 

28 

240 

30 

30 

El 

16 

44 

160 

20 

50 


20 

64 

120 

15 

63 

30 

6 

70 

80 

10 

75 

40 

8 

78 

40 

5 

80 


a 

86 

80 

10 

90 

70 

14 

100 

80 

10 

100 


too 



jAV'vr 1 

*4.& CitiAlffioS ) 
] fftil / j 

tfo/c* ■ ■*?* i 

■+(?uSf WwT j 

CiOTmxjt 
f<?06 



fAM>iy 4 tAWlr* B 
fU" 1 # •*> SSISV M mccac 


Fig. 8.6—Showing Details of Monthly Expenditure of two 
Families, A and B 




























DIAGRAMMATIC REPRESENTATION 93 

Bilateral Bar 

Bilateral charts, also called ‘gain or loss charts’, employ ban 
to show plus and minus direction from the point of reference. 
Fig. 8.7 is such a chart and shows the percentage of profit or loss 
on the sale of one ton of coal during the years 1924 and 1920. 


TABLE 0 5 



1924 

1928 


Rs. 

Rs. 

Cost per ton: 



Wages 

12*74 

7 95 

Other costs 

5 46 

4 51 

Royalties 

0 56 

0 50 


Total R*. 18-76 

12*96 

Proceeds of sale per ton 

19 91 

1216 

Profit ( t ) or Loss { — ) per 

ton 4* 1 * 15 

-080 

In order to make a proper comparison between the costs, 

sale proceeds and profit and lr»s for the two years 

in question 

the device of percentage bars 

[ is most suited. The first step, 

therefore, in the construction of the diagram is to convert the 

above data into percent ages 

of sale proceeds for 

each of the 

two years. 




1924 

1928 

Cost per ton : 



Wages 

03*99 

65*38 

Other costs 

27*43 

37*09 

Royalties 

281 

411 


Total 9423 

10658 

Proceeds of sale per ton 

• ... 100 

100 

Profit (-f) or Lois (—) per ton 4 5*77 

—6'58 


Two equal bars to represent the proceeds of sale per ton of 
the two commodities are drawn. The percentage by way of 
wages, other costs and royalties are then cut off in the same 
order. The surplus in the year 1924, indicated by white 
portion, represents porfit of 5*77 per cent. The deficit of 1928 
is represented by extended be!ow-the-basc portion of the bar. 






H Alt IPfT»O0UmOft TO STATISTICAL METHODS 

•piltBor 

For the purpose 
of showing diagram* 
matically compo¬ 
nent parts of a total, 
the split bar is a 
promising new de¬ 
vice* Fig. 8 8 illus¬ 
trates its use to show 
the proportion that 
the constituent divi¬ 
sions of cost bear to 
the total selling cost. 

Bar diagrams are 
used for charting 
time series also. 

8.9 shows the 
electrical develop¬ 
ment in India and 
hg 8.10 shows the Fig. 8.7—Showing Cost, Sale Proceeds, 
assets and liabilities Profit or Loss per ton for 1924 and 1928 
of the issue department of the Reserve Bank of India. 




Fig. 8.8 







Fig. 8.9 

Show by meant of a suitable diagram the following: 


Assets of the Issue Department 
in lakhs of rupee* 



Gold Coin 
& Bullion 

Foreign 

Securities 

Rupee 

Coin 

Rupee 

Securities 

1942-43 

44,42 

3,19,11 

22,23 

1,39,53 

1943*44 

44.42 

6,43,52 

14,28 

85,45 

1944*45 

44,42 

8,63,73 

13,52 

57,95 

1945-46 

44,42 

10,61,26 

15,53 

57,84 

1946*47 

44,42 

11,33,88 

19,43 

57,04 











96 AN INTRODUCTION TO STATISTICAL METHODS 


IUcta«|!e 

Though the ‘bar* 
it the moit common 
method of repre- 
feming statistical 
data, there are occa¬ 
sion* when it does 
not serve the pur¬ 
pose and as such 
other forms, viz., 
rectangles and 
circles are employ¬ 
ed, A rectangle is 
a two-dimensional 
diagram, he,, its 
height as we!) as 
width are taken 
into consideration 
for purposes of 
representation. (It 
may be repeated 
here that in a bar 



diagram it is only Fig. 8.10—Showing Assets of tha 
the height of the Issue Department 

bar and not its width that Is significant,) Rectangles are used 
when it is desired to give a more detailed information than can 
be conveyed by the ‘bar*. 

Consider the following data : 

TABLE 8,6 


Factory 

Wages 

Materials 

Other 

Costs 

Profit 

No of Units 
Produced 


Rs, 

Rs. 

Rs. 

Rs. 



A 3,000 5,000 1,000 1,000 1,000 

M 2,000 3,000 800 500 700 

Represent the above information, and also the cost and 
profit per unit diagrammatically. Since details of the proceeds 







0IACIWMMATTC HEPHE^EPfTATIOTf 97 


per uni* as well as the quantity produced are to be represented 
a rectangular diagram should be used. 

The products of factory A realise R* 10 per unit* 

3000 4-5000+10004-1000 10000 _ m 

1000 — W ™R». 10 

The product* of factory B realise R.n. 9 per unit. 

2000 4 - 31000 4 8004- M0 6300 „ 

700 700“ - Rs 9 


Thus the height of the two rectangles will be in the ratio of 
10 : 9 and their width in the nuio of units produced by the 
two factories, i.e., 10 : 1. Assuming a scale of IVRi, 2 and 
T --500 units we get the following diagram : 



WAGES j 
MATERIALS | 
OTHER COSTS! 
PROFITS 



7 


Fig* BJ l—Showing Cost, Profits, Total Number of Unit* 
produced etc. 



98 All INTRODUCTION TO STATISTICAL METHODS 


It will be noted from the above diagram that it is represent* 
mg (!) price per unit giving details of expenses incurred in 
producing each such unit, (2) profit per unit, (3) quantity 
sold as well as total expenses under each head. 

Represent diagrmmmaticaUy the following : 

TABLE 8.7 


Bstftlli #f tbs Coat of two Commodities 



A 

B 

Price per unit 

Rs. 4 

Rs. 5 

Quantity sold 

40 


Value of raw material used 

Rj. 52 

Rs* 30 

Other production expenses 

Rs. 64 


Profit 

R>. 44 

Rs, 40 



Pig, 8.12—!Showing the Cost of two Commodities 

A comparative tutdy of Family Budget can best be conduc¬ 
ted by meant of n*taog\»kr repmrotatwm. 







DIAGRAMMATIC BRME8ENTATION 99 

TABLE 8.8 


Frailly Bi4g»t ofTIm FanlUet 


Family 

A 


B 


c 


Items of 

Actual 

% 

Actual 

% 

Actual 

% 

Expends" 

Expend!* 

Expend!* 


Expendf 


lure 

lure 


lure 


ture 



i 

2 

J 

2 

1 

2 


R*. 


R*. 


Rt. 


Food 

12 

60 

30 

50 

90 

30 

Clothing 

2 

10 

7 

Hi 

35 

I>| 

House Rent 

2 

10 

8 

»3| 

40 

IS* 

Education 

1*8 

n 

3 

5 

12 

4 

Litigation 

Conventional 

1 

5 

5 

81 

40 

I3| 

Necessaries 

0-8 

n 

3 

5 

60 

20 

Miscellaneous 

l 

5 

4 

61 

23 

7» 

Total 

20 

100 

60 

100 

300 

100 


(Column 2 in the case of each family has been obtained by 
comparing each item of expenses to the total expense of that 
family. (See fig. 8.13) 


Square* 

Squares are used when it is desired to compare quantities 
that differ widely in magnitude. If we are to present diagram* 
matically the population of two towns as it is given in table 
8.9, bar diagrams will fail to do the job. The difference in 
the two populations is so great that the height of one bar 
dial! be 36 times as great as that of the other so that one bar 
would become too big and the other too small. To overcome 
this difficulty squares are used. 

The side of a square varies as the square*root of its area. 
If the areas of two squares are in the ratio of 4 : % the ratio of 
its sides would be yH ; V9, tc M 2 : 3. 







100 AN INTRODUCTION TO STATISTICAL METHODS 



Fig. 8.13—Showing Expenditure of Three Families, A t FI and € 

Table 8.9 below gives the total population of the two towns 
A and B and alio the proportion of literates and illiterates. 


TABLE 8.9 


HI iterates 


l,GOO 

160,000 


9,000 

200,000 


Town Literates Illiterates Total population 

~ 1,000 9,000 10,000 

B 160,000 200,000 360,000 


(B. Cam., Agra, t$ 4 $) 
From the above table it is clear that the population of two 
towns differs widely and hence square diagram is better fitted 
for this case. To do that we proceed as follows : 










DIAiJHVMMATlC REPRESENTATION 101 

(t) Find out the square-roots of the population of the 
towns A and B. The square-roots arc \fl0&j0 and v 3^0000 or 
KM) and 600, 

(it) Divide the resultant figures by tome common figure, 
say 100. The quotients obtained will be l and 6. 

fill) Draw two lines whose lengths are ’5 and $ inches 
respectively. 

(re) Then draw squares on the two lines. 

(in In order to show the literates and illiterates within the 
squares we will first find out the ratio of literates to illiterate# 
in the two towns respectively. It is : 

In town A 1:9 

„ „ B 4:5 



B A 

Fig, 8.14 — Showing Literates and Illiterates in 
Towns A and B 



102 AH IHWODlHniON TO gTAT18T!CAL METHODS 

The next step will be to divide the square drawn for 
town A t into 10 parts, the portion showing literates will be 
shown as in figure 8.14. Similarly, the literates will also be 
shown in the square representing population of town B. 

Thus figure 8.14 exhibits the required information. Due to 
the great time and labour involved in drawing accurate squares, 
an alternative method (pie diagrams) may be employed. 

Circles 

Circular diagrams arc alternative to square diagrams. Just 
as the areas of squares vary in the same proportion as the 
squares of their sides, likewise the areas of circles vary as the 
squares of their radii. If the sides of two squares are iri propor¬ 
tion of 2 : 3, their areas would be in the proportion of 4 : 9. 
From this, it follows that if the radii of two circles are in the 
same proportion as the sides of two squares, the arras of the 
circles would also be in the same proportion as the areas of the 
squares. Hence the lengths which are used as the sides of 
Squares, may also be used as the radii of circles. In fig, 8.14 
the sides of the squares are in proportion of 1 : (i. By keeping 
the radii of the two circles in the proportion of 1:6, we 





DIAGRAMMATIC REFftESEKTATiON 103 

can at well draw the circles representing the data shown in 
Table 0 9. 

Just as bars, rectangles and squares may be sub divided in 
order to represent component parts, similarly circles may be 
subdivided into various sectors. Such a subdivided circle is 
known as pie diagram. 

Fie diagrams show the changes in total and component 
parts. In figure 8.15 the two dreies show (t) the changes in 
the total of clearing house statistics, and (is) the share of 
individual towns in such totals. Interpretations of angular 
diagrams require a visual comparison of the area of the two 
circles which it b quite difficult, if not impossible, to make. 
That is why pie charts are not generally used. 

TABLE 8.10 


Clearing House Statistics In 1940 41 and 1947-46. 


Town 

Total 
amount 
in R*. 

1940*41 1 

Degree* 

Total 
amount 
in Rt. 

1947-48 I>,.m 

Bombay 

80,232 

80232x360 

207260 

139 4 

2,55,264 

255264 360 
658014 

139 7 

Calcutta 

100,853 

100853 x 300 
207260 

175‘2 

2,59,996 

259996 >; 360 
658014 

142 3 

Delhi 

2,853 

2853 x 360 
207260 

50 

12,646 

12046x360 

658014“" 

60 

Kanpur 

1,920 

1920x360 

207260 

3 3 

10,983 

10383 x360 
658014 

60 

Karachi 

4,676 

4676x360 

207200 

8 1 

27,481 

27481x360 
658014. 

150 

Lahore 

1,633 

1633x360 

~wm 

2*6 

4,954 

* 4954 x 360 
«3»f4 * 

2*7 

Madras 

. 10,865 

10863 x 360 

WiW~ 

189 

34,794 

34794 x 360 
658014 

19*0 

Others 

4,228 

4828x360 

29/266 

73 

51,896 

51886 x 360 

“83614 

284 

Total 2,07,260 


m* 

6,58,014 


IS? 


Method 0 f Constructing m Pic Diogram. A pie diagram is made 
with the aid of protractor. The steps involved its the cons¬ 
truction are: 





104 AN INTRODUCTION TO STATISTICAL METHODS 

Fim, calculate the square-root of the total amount of 
clearing house returns for the two year* under consideration. 
The square-roots are V26726b and - 455 and Bit, 

Thi* will supply us the basis of determining the radii of 
Use circle* which can be taken as *65* and MS' (dividing 455 
and 805 by 700), Further details are explained in table 8,10. 

Maps aa 4 Pictures 


The main object of representing the data in the shape of 
picture*, i* to help in quick visualisation of comparisons of 



Of PASSZNetfiS C***'fD 


f94Q-<u 

mm 

S'-1*70*9** 

t*4f 4® 

tmm 

srjl 00 000 

f942 43 

mmt 

«ft SOD 000 

*949-44 

tmmt 

ft r& t*a« 

3*444$ 

tmmm 

*2*VOQOOO 

*9494* 

mtmmt 

.044J0Q0OO 

*04*4? 

ttmmmt 

i IS 0*04*0 


mtmmt 

044000000 

3*4*40 

ttmmmt 

94 SO4O0O 

*04* m 

mtttttttttt 

KOiSSVoOCO 

*0*0- & 

tttmtttttm 

>307990000 

m*s* 

tmtttmm 

* 039 OQOOO 


f ** CH ^ <**C*04tS 1 



Fig . 8 .16 





DIAGRAMMATIC KEPHE*KNTATIQN 


105 


magnitudes; far example, a pictogram might represent by a 
picture of men the population of India accompanied by a 
picture of proportionately fewer or more men representing, 
respectively, the populations of Britain, the U.S.A., China, 
Japan, etc. More often pictures are adopted to help in visua¬ 
lising the proportional parts of a whole magnitude, e.g>, a 
rupee is shown divided into sectors, representing the way in 
which the revenue of the Government is spent. Fig. 0.10 
illustrates the number of passengers carried by Indian Railways. 
The picture of each man represents 100 crores of passengers. 

Figure 8.17 is much more explanatory and illustrates more 
than one variable. It shows the area under main oil-seeds 
and their yield in the Indian Union in the year 1947-48. The 
area is illustrated by the use of squares, whereas the yield is 
represented by the picture of a bag within the square. 


i *5 7 if**A ? f. i) AP( A u ‘Wt 7 v / 

4 */' Vf? LG T\ ThYTh^ak' 



y, 


r n 7$ ■ 





a ho v * p u u 7 




<•* ■ *f t b < amo* 


Fig. 8.17 

Cartograms or statistical maps arc of three general types: 
(i) Cartograms by dots or points, 

(if) Cartograms by colours or shades, 

(Hi) Cartograms by cross hatching. 

Cartograms by Dots, In such types of maps the frequency 
of the data is represented by various types of dots. This type 
of cart Of ram may be further tub-divided into three classes. 





106 AH INTEODUCTTOIf TO otatisticai* methods 

In the first class, dots varying in six* for different quan¬ 
tities or frequencies are used. This type of cartogram is not 
considered satisfactory, because of the necessity of using vary¬ 
ing sixes of circle* as dots, 

In the second class of dot cartogram, dots of uniform sixc 
are used. They can be counted to figure out the total Dots 
may be shaded to indicate different values. Generally greatest 
quantity is shown by a solid dot, three-quarters, one-quarter 
and other shading indicating less quantities. This type of 
cartogram is also unpopular because of the mechanical diffi¬ 
culty of arriving at the proper magnitude to allot to each dot 
of uniform size. 'If the magnitude assigned to each dot is too 
large, it becomes difficult to show graphically the small quanti¬ 
ties relating to geographical locations where the characteristic 
is scarce. On the other hand, if the magnitude assigned to 
each dot is too small, this results in too great a crowding of the 
dots in areas where the characteristic is very plentiful .* 1 

in the third claw of dot cartogram the size of the dot is 
immaterial; the relative frequency with which it occurs is all 
that ii important. The dots arc so small that they cannot be 
easily counted. 

Cartogram by Colours or Shades. This type of cartogram is 
used scarcely as iu cost of printing is much higher than that of 
other types of cartogram*. If a coloured cartogram is used 
to represent a phenomenon which is of a graduated type, it is 
likely to mislead the reader. If colours are to be used at all 
they should be confined to different intensities of the same 
colour. If the number of shades be too many, two colours, 
say red and blue, may be employed. Such cartogram* are 
mostly used in the preparation of physical maps. 

Cmfogrms by Crms-haUhing. This type of cartogram is the 
moat popular because it is cheaper and more effective. The 
different types of phenomena under this method art represent¬ 
ed by tines, dots, circles and such other symbols. 






DIAGRAMMATIC REPHK8 ROTATION 


107 


Gnxrd llhutrttim. Design a single diagram to exhibit the 
entire information given by the following statistics, so as to 
bring out how the causes of death are distinguishable in their 
intensities and how they are aho definitely associated with 
social status. 


TABLE 8.11 


Causes of death 

Illegitimate 

Children 

It 

<£0 

*8 e 

si 

1? 

D£ 

c 

« 

5 

< 

*o 

a I 

< 

Diarrhoea and 
Enteritis 

24 

17 

11 

i 

1 

60 

Prematurity and 
Atrophy 

56 

33 

32 

34 

24 

179 

Bronchitis and 
Pneumonia 

22 

25 

19 

M 

4 

81 

All causes 

102 

75 

62 

52 

29 

320 




108 aw introduction to statistical methods 


EXERCISES 

L ‘Quirk vmuluattun of many a rather complex situation can be readily 
achieved by merely looking at a simple chan. So useful t* the chart 
in giving a quick grasp of the characteristic of daix that it has been 
adopted m many popular hookt, magazine* and in the newspaper*/ 
Ditto si. 

2. 'Diagram* help os to visualise the whole meaning of a numerical com¬ 
plex at a single glance/ Comment. 

SI. What Joints should be taken into consideration, while presenting a 
table diagrammatic ally ? What, in your opinion, air the test* of a 
good diagram ? 

4 Name the vartoui type* of pictograms and describe in detail the 
method to be followed via drawing piictogram*. 

5. Write short note* on : 

(*; Split Bar Chart, 

frft) Cross-hatched Zone Diagram, 

(iiii Fie Diagram, 

{iVj Sub-divided Mai Diagram. 

6. Discus* the utility of ca r tog tarns, and explain 
(0 Cartograms by dots or points, 

(ii’f Cartogrami by colour* or shades, 

(iii) Cartogiatu'i by cross*hatilting. 

7. ^Diagrams are handy tools in the hand* of sale* executive/ Discuss. 

0. Draw a simple diagram to represent the following Autistic* relating to 
the gross earning* oflmltan Railways from 1942-43 to 1951*52, 


Year 

Rupees to Crures 

1942 - 43 

168 

1943 - 44 

199 

1944 . 45 

233 

1945 - 46 

244 

194b - 17 

222 

1947.48 

HD 

I94K - Vi 

234 

1949 - 50 

258 

1950 - 51 

265 

1951 - 52 

294 


9, Hepinwt the following data by a suitable diagram . 


In lakhs of rupees 
1938 1943 1944 

Total Import* 1.5,342 11,084 18,010 

Total Exports 16,189 18,632 21,884 

i badudtng Re¬ 
exports) 


1945 

23,754 

21,813 




DIAGRAMMATIC REPRESENTATION 


109 


10. Represent the following data diagrammatical!)' : 

PmtnUti* tk*r* in India's Exports 


Countries 

\<m 


1944 

1945 

U.K. 

Ml 

29*2 

300 

• 

Total other empire countries 

18*6 

37*5 

Mi 

50*4 

VS. 

ft 3 

IH 

22-0 

23*2 

Other non-empire countries 

BO 

15-7 


17 1 


If. ‘Diagram* are meant for a rapid view of the relation of different data 
and their companion.’ Discuss. 

Draw a'Bar* or *Pk* diagram to represent the following data ; 


Output and Cost of Production of Cm »/ 



1924 

192ft 

Cost per ton disposed commercial!* 


■" ... 

Wages 

12 74 

7 95 

Other costs 

fHf, 

4*5! 

Royalties 

O' % 

050 

Total 

1ft- 7 ft 

12% 

Sale* proceeds per ion 

19*91 

12-lb 

Profit or loss per ton 

4 115 

-0 80 


(M. A., Apa) 

12. Represent the following figures by a mutable diagram : 


7 hr Amount to bo spent on Various Heads under Candhiart Plan oj Wardha 

Various Heads 

Rs. 

in crores 

Agriculture 

” . 

TTw 

Rural Industrie* 


m 

ijsrgr Scale and Key Industries 


1000 

Public Utilities 


10 

Trantpor e 


400 

Public Health 


260 

Education 


293 

Research 


20 


Total 

3500 

13. The following fable give* the detail* of 
families : 

monthly exprndis 

ure of three 

hems of Expenditure family A 

b 

c 

Rt. As. 

Rs. A*. 

R s As 

IOT ... ft—o 

5ki—ff 

Iff-6 

Clothing . . 2—0 

7-0 

35 — 0 

House Rem ... 2 — 0 

8-0 

40-0 

Education ... 1—8 

3-0 

12-0 

litigation ... 1—8 

S—0 

40**0 

Conventional necessity <h~8 

S-0 

00-0 

Alt tcdlsnrout ... 1—0 

4-0 

23-0 


Represent the above figure* by a suitable diagram. Which family 
spending the money most wisely ? Give reasons. 


iM A , AlloMed* t$37) 










110 AN INTRODUCTION TO STATISTICAL METHODS 


Reprwent the following data diagranmiatkaUy : 

TwkU Shmmg At DutrtM** if BmA AAmets 

Aicm4ing u Pwthwt m mftnm Camtrit* 
%*Ut 


Industry 315 27 0 55*0 24 0 

Commerce 47 « 21 6 35*4 25*0 l 

Agriculture 30 90 — 250 

Personal k Professions! 8*9 30 9 3*7 Ifl O 

Miscellaneous 8*1 U'5 5*4 90 


IS. The following table give# the details of the cast and construction of a 
house In Allahabad: 




Represent the above figures by a suitable diagram, 

(B . Com., AttahtM, '.94') 

16- Draw a suitable diagram to represent the following informations : 



(*« &*•> A***' f94S) 

17. Value of the imports of glass and glassware into India from different 
countries during the year 1931 >32. 


JM»» 

Chechoslovakia 

Germany 


42 lakh of Ra. 
23 lakh of Rs. 
20 lakh of Rt. 


UK. 

Belgium 

Other 

countries 


13 lakh of Rs 
13 lakh of Rs. 

U lakh of Rs. 


Represent the above figures by suitable diagrams. 

(«. Cm., dJ/«A*W, igsj) 

IB. Represent the following by a suitable diagram : 


Pr incipal Heads of 
Revenue 

Custom 

Central Excise Duty 

Corporation Tax 

Tam on Income 

Salt 

Opium 

Other Heads 


1936*39 
t-akhs of Rs. 


1939*40 
Lskbt of Ra. 













DIAGRAMMATIC RKPRK6KJfTATION HI 


19* The following sxvta fb© WHh-rate and death rate of a few countries 
of the world during the year 1931 : 


Country 

Birth-rate 

Death-rate 

Sim 


44 

...... ’"If* .' 

Canada 


24 

U 

a&A. 


19 

12 

India 


33 

24 

Japan 


32 

19 

Germany 


16 

U 

France 


18 

16 

Norway 


17 

It 

Irish Free State 


20 

14 

UX 


16 

12 

Soviet Russia 


40 

18 

Australia 


20 

9 

New Zealand 


18 

8 

Palestine 


53 

23 

Sweden 


15 

12 

20. The following table < 
a factory in biennial 

rives in arbitrary unit* 

(B. tfem., Lwk*m> tg$f) 
the coat of production of 

averages: 




■■ .. UiitrUl .....sr■» 35.3*.35.3#*'«.ir & 

Labour 10 8 U U 11 12 7 3 8 

Overhead 14 10 15 16 17 20 12 9 12 

Total _ 61 43 61 63 63 70 41 31 40 

Draw the graphs of different components of costs as percentage of 
total cost and comment on the data. 

(M A. t P*4ff«» Ig4|) 

21, Show the details of monthly expenditure of two families given below 
by means of two-dimensional diagrams : 


item of expenditure 

■mm 


Family B 

Income R*. 500 p.m. Income Rs. 400 p.m. 

Food 

m 


120 

Clothing* 

80 


80 

House Rent 

too 


60 

Education 

30 


40 

Fuel and Lighting ... 

40 


20 

Miscellaneous 

40 


40 



(M-A., S*PP*> #95*1 

22. Details of prices, cost and quantify sold of three 

commodities are 


sub able diagram. 


1 

n 

Ill 

Price of a commodity 

Rs , 3 per 

Rs. 4 per 

Rs. 5 per 


Unit 

Unit 

Unit 

Quantity sold 

.too .. ‘ 

. mr~ 

.. 70 ’ 

Value of raw materials 

Rs. 115 

Rs. 120 

Rs 130 

Expense* of management 

Rt. 30 

Rs. 30 

Rs. S5 

Expenses on labour 

R*. 40 

Rs. « 

Rs. 53 

Other expenses 

Rs. 15 

Rs. 25 

Rs. SO 

Profit* 

Rs. 100 

Rs 100 

Rs. 100 
















112 AH ISTBODUCTIOH TO STATISTICAL METHODS 

2$, Figures of Industrial Production of India in 1945 are given below. 
Draw circles to represent them. 


-5751- m - 

Ju*e 1086 

Ammonium 

Sulphate 220 

&mew 

Paint* 

Paper 


“TW3— 

51 2 

W2 

24. Distribution of expenditure under 
R«. > t» as follows .* 

the Five-Year 

Plan 

(in trore* of 


Centre 

State 

"" Total 

"^Transport $ Oormmmicatiou ’’ 


.ST~ 

49/ 

Irrigation & power 

Social service including 

'M 

m 

Mil 

rehabilitation 

Agriculture Ac Community 

m 

234 

425 

Development 

m 

174 

:m 

Jnudstry 

147 

2 f> 

1 71 

Mtscelianecms 

41 

n 

52 


Draw a circle divided into sector* to rcprevm the above data. 


%% 

The following data give* population in millions ; of the various uat.rs. 


Repretent it on a map of India. 




State 

Population 

Stair 

Population 


Part A 


frsv an tor r*(*<x h tn 

9-3 


Assam 

9*0 

PEPMJ 

V5 


Bihar 

40*2 

Rajasthan 

jv:j 


Bombay 

0-2 

Saurashtra 

•H 


Madhya Pradesh 

21*2 




Madras 

57*0 

Part 0 



Orissa 

14 C» 




Punjab 

W. Bengal 

12 b 

Ajmer 

Bhopal 

0 7 


24 B 

OB 



Delhi 

17 


Part B 


Himachal Pradesh 



Hyderabad 

l*F7 

Yindhya Pradesh 

V® 


Madhya Bharat 

BO 

Manipur 

On 


Mysore 

9 1 

Tripura 

Odi 

2 b. 

By pictograin reproduce thr following uata of Colombo 
how the money was xprr>t. Money given in million £, 

Plan about 


193 M2 

1952*53 

{k) Country 1951-52 

1952-51 


Agriculture hi 

R2 

Burma 11 

17 


Multipurpose 
project 31*50 

Transport St 

H 

Ovloq 20 

23 


Communication 95 25 

1H 

India 219 

2«H 


Fuel St power 21 

M 

Indonesia 37 

m 


Industry 15 25 

21 




Sorial service 79 

m 5 

Matava k British 





Borneo 27 

n 


Research A 




others 29 

41 

Pakistan 31 

44 


27. Represent the data of question 19, 20. 21 by threc-dimrmiwftal dia- 
gram*, What are the advantages and disadvantages of this method 
over ? wo-dimensional representation ? 










DIAGRAMMATIC RF PR ESF. NT A TI0I4 H3 


Repreienl the following data of rice production in various States of 
India in 1949*50 by cubes {figures given in 000 tom). 


Assam 

1737 

Jammu fit 


Bihar 

3631 

Kashmir 

46 

Bombay 

IDttO 

Madhya Bharat 

46 

Madhya 


Mysore 

219 

Pradesh 

2 m 

Rajasthan 

14 

Madras 

4050 

Himachal 


Orissa 

204 & 

Pradesh 

29 

Punjab 

M3 

C-oorg 

33 

U,K 

2507 

Manipur 

m 

\V. Bengal 

W»2 

Tripura 

m 

Hyderabad 

34# 

Vmdhya Pradesh 

157 

29. Represent by mean* 

of a suitable diagram* the following data on 

Industrial production 

of 1*150 {figures in ,000 tom). 


Coal 

31,994 

Sulphuric Acid 

103 

Wheat Flour 

335 

Soap 

73 

Ckdfcc 

21 

Pig lion 

1326 

Omrru 

2613 

Sugar 

97fi 

Paper 

109 

jute Textile* 

83fi 


30. Represent by suitable diagram the .following data of patients died 
in hospitals and dhpensarir* 'clause* l-VIl; in India. 


Disease 

!94fi 

1947 

1940 

l-holer a 

1445 

4595 

fi 172 

Dysentery 

200 ) 

2:402 

2039 

Tubercle of l.ung* 

3453 

3234 

4046 

Disease of Nervous System 

7100 

1974 

1653 

Disease oF Respiratory System 

1540 

17% 

1741 


31. The following figutm give thr nrt change in Hutinea* Inventories in 
billions of dollari in various yean in America Represent them by 
suitable diagram. 


w 

w 

tr 


t 

w 


v 

V 


& 

n 

c 

* 

c 

* 


n 

c 

> 

n 

w' 

V. 

> 

£ 

> 

n 

£ 

in 

> 

2 

1929 

lb 

1934 

- 1 b 

1937 

2 3 

1941 

3.9 

190) 

-0 3 

1934 

— H 

1938 

-10 

1**42 

2 1 

1931 

— 1 4 

) 935 

0 9 

1939 

0 4 

1943 

—0 9 

1932 

-2 6 

19% 

10 

1940 

2 3 

1944 

-0 8 


V\\% —0 7 


8 










Chapter 9 

Graphic Presentation 


I n the preceding chapter we discussed the various types of 
diagrams which are commonly employed for presenting 
categorical series* However, there are other types of series 
which cannot be represented by one or more dimensional 
diagrams, but can be located with respect to two or more 
dimensions. 1 Such charting is known as graphic presentation 
and is used to present the following two types of series : (*) time 
aeries, and (it) frequency series. 

Rectangwlar Co-ordinate* 

Usually graph is represented by rectangular co-ordinates in 
two dimensions. A horizontal line is chosen as the ,Y-axis or 
axis of abscissae and a line perpendicular to it as the T-axis or 
axis of ordinates. The point of intersect ion of the two lines is 
the origin or zero point. Distances measured towards the 
right or upward are reckoned as positive, and distances 
measured towards the left or downwards are negative. In 
practice the graph is plotted upon a grid or net-work of fine 
tines which makes the plotting of (mints easier. 


* fane chan* are used to present tiiuoriral data due to their ntpcrioriiv 
ever bar-charii on several grounds. the mmemrnt of dir variable 

is usual I* ccmtiimous through time, in ihrrr i* no logical reason for 
showing the discontinuity which die space between the bar* suggests. 
Sftotd, the line chart creates a mote accurate impression because the 
artificial plateau* at die top* of column* are lepiaord by connected 
pointi, Third, less time is required to plot point* and connect them with 
linn than to draw many bar*, Fourth, in reading the Him? chart the eye 
fcdlows the line aiul appraiser the distance of the line from die base of 
the chart or some other point of reference. %o that the reader gets a 
quick impression of both die movement in lime series and their Absolut* 
magnitudes,' 

—Xemvangtr i Ei*mt*U try Stshttkal AfrfWr, p. 174. 




GRAFHIC PBEPEKTATION US 



r 

Qua 0PA&J Zf 

♦ 

qhao*ant r 

X, r V 

+ x;+Y> 

: 

X-#■ *■*•*! 

-T--X 

QOAOPAMT m 

| QUADRANT fit 

~ X , •• V 

\ +X; - Y. 

*1 

! 


Fig* 9.1 

Charting Time Seriei 

Usually .V-ax« represents time and )'-axis the variable. 
As there are no negative values of the ‘time* or of the variable 
for the historical data, it is not necessary to draw the negative 
sides, The axes are ruled to the right and upward from the 
origin, and as such instead of choosing the origin in the centre 
of the grid, it is chosen near the lower left corner of the chart 
as shown in fig. 9.2. In the construction of a graph, many other 
precautions that are to be observed will be discussed at a later 
stage in this chapter. 

Here it may be noted that the scale along T-axis should 
begin from 0* This is not necessary for scale along A-axil as 
there is no aero in time. 

Fig. 9.2 is a line chart of the data given in table 9.1. Here 
is shown the increasing number of industrial dispute* or work* 
stoppages during the Second War and after, The largest number 
of disputes in the year 1947 conveys the idea to the mind that 
the genera] turmoil and dislocation follow ing the partition and 
communal frenzy contributed considerably to industrial dis¬ 
harmony. 




116 AW INTRODUCTION TO STATISTICAL METHODS 


TABLE 9.1 

India«trial Dispute* in India 


Year 

No, of Disputes 

Year 

No. of Dispute* 

1939 

409 

1945 

820 

1910 

322 

1946 

1620 

1941 

359 

1947 

1775 

1942 

694 

1948 

1239 

1943 

716 

1949 

920 

!944 

658 




For the construction of fig. 9.2 the ordinary graph paper 
k used. It is not essential to draw the ordinates corresponding 
10 the variable (plantdies which sue being represented 



graphically, all that is necessary is to it*present the end of the 
ordinate by plotting a point on the paper. Then thete conse¬ 
cutive points are connected by means of straight lines. The 
consecutive points are joined with a twofold object : 












isiurnic fiisikt atior U7 

1. The reader may easily follow the movement in the 
values of the variable from one year to another. 

2. The reader may also interpolate from the diagram 
an intermediate value of die variable for a year for which he 
has no data. Thus we would be justified to assume that 
a point half way along the line connecting the tops of the 
ordinates coiresponding to the industrial disputes of 1943 and 
1944 would correspond to the industrial disputes of the first 
half year of 1914 .'when 1943 refers to the twelve months 
of the calendar year J on the assumption that the increase 
in first half year of 1944 equalled that in the second half 
year of 1944. 

False Base line 

When the fluctuations in a variable are small relative! 
to its size and it is desired to visualize these fluctuations 
properly, the vertical scale may be amplified. This ran be 
done it, instead of showing the entire scale from zrio to the 
highest value involved, only at in«s It iv shown as h necessary 
fur the purpose and that portion which lies between zero and 
the lowest value of the variable, is left out. Table 9 2 shows 
the annual rainfall in inches in Assam. The variable does not 
fall below 85" in any year. In order to appreciate the fluctua¬ 
tion* properly, the scale can be am fibbed within the available 
space by omitting a section of the vertical scale. This 
omission should be marked by an interruption in the grid 
rulings, lest the reader base his judgment upon the apparent 
position of the curve with reference iu zero. See fig. 9.3, 
TABLE 9.2 


Yearly Actual Rainfall jinches) in A*«»s» 


Year 

Rainfall 

^ ear 

Rainfall 

1935 

H>1 98 

1943 

104 59 

1936 

100*78 

1941 

99 03 

1937 

89 89 

1945 

10616 

1938 

104 17 

1946 

103 41 

1939 

93 31 

1947 

108 75 

1940 

94-02 

1948 

117 84 

1941 

108 53 

1949 

109 10 

1942 

I04 94 






Fig. 9.4 













CSUPHtC PBK&EWTATtOft 119 

ust at the unused portion of die vertical scale can be deleted* 
the unused portion of the homwual scale tan also be deleted 
in a similar manner if conditions demand such a procedure* 
See fig. 9.4. 

Comparison of Time Stric t 

The method of graphic representation it specially suited 
to bring out the relationships between events which are ordered 
in time. This can be achieved by the simple device of plotting 
several curves (one for each variable) upon the same graphs* 
with identical horizontal and vertical scales. A study of the 
curves so obtained will enable us to visualize the significant 
relationships between the variables under consideration. 

In fig. 9.5 are plotted the share price indicts of the two war 
cycles in India. A study of these curves shows that the second 
war cycle of 1939-48 reputed the first war cycle in a remark* 
ably parallel movement. This apparent similarly may be due 
to the overwhelming influence of war conditions in both the 

TABLE 9 3 

Share Price Indices of Various Countries for the 
Second War Cycle 


Country 

1939 

1940 

1 

CM 

? 

CO 

s 

1944 

«r> 

i 




Canada 

100 

84 

70 

66 

86 

86 

103 

120 

109 

118 

U.S.A. 

100 

93 

85 

75 

100 

108 

130 

151 

135 

138 

Australia 

100 

100 

103 

93 

111 

112 

116 

132 

155 

168 

U.K. 

100 

83 

90 

107 

129 

141 

153 

170 

172 

155 

India 

100 

115 

125 

127 

172 

194 

210 

278 

210 

174 


periods. In this figure (fig. 9.5) a comparison has been made 
between the same variable at two different periods but at the 
same place. A comparison may also be made between the 
same variable at different places during the same period* 
Table 9.5 sets out the share price indices of various countries 
for the second war cycle. 





120 Alt INTRODUCTION TO STATISTICAL METHODS 



Mth4 '.f* rs &n 'VJ W- >**s 

I ! I f { » i ! ■ 1 

OkW #44 04? *<J A>44 **J$ >4# #4/ M 


Fig. 9.5 

A comparative itudv of the various curves in fig, 9.6 
demonstrates the unity underlying much apparent diversity. 
The extent of diversity is reflected by their movements. Though 
the Indian series has a range of fluctuation for exceeding that 
of any country, the movements of all of them are sinritar. The 
factors affecting share prices in the second war cycle wire almost 
ie/tntical in all countries t viz., inflation, the pursuit of a cheap 
money policy and the maintenance of law interest rates inside 
a cloaed economy by almost all important countries of the world. 

It must be noted that even with a simple scries, there are 
two points to be considered in graphical representation : (a) the 
curve is to show properly the actual sixes of the numbers in 
the series, and (b) the curve is to indicate properly changes in 
them numbers so that they may be compared easily. If we 
plot two aeries on a graph, there arc six points to be considered: 






GRAPHIC PRESENTATION 


121 



Fit;. 9.ft 

1. The sizes of the numbers of iirsi sciie*. 

2. The sizes of the numbers of second scries, 

3. The changes in the numbers of fu st series. 

4* The changes m the numbers of second series. 

5 A comparison of the sizes of numbers. 

6. A comparison ol the changes of numbers. 

It is quite conceivable that a graph may correctly indicate 
some aspects and may fail to indicate correctly other aspects. 
Figures 9,5 and 9.6 reasonably serve their objects in view, but 





122 Aft INTRODUCTION TO STATISTICAL METHODS 



Fig. 9.7 

figure 9.7 is drawn on such a « ale that changes in the different 
curves are hardly discernible. 

In order that changer between dijferent wits may bt comparable 
it is necessary either to reduce (he series to percentages {as is done in 
fii*' 95 a*d $ .6') or to adopt a * Ratio Scale' along the axis of T, 
Ratio scale curves are discussed later on in this chapter. The 
former method is recommended for use by students. 






GRAPHIC PRESENTATION 


123 


CompoMat part Liae dart 

The graph or tine chart may also be adopted to perform a 
service similar to that rendered by the sub-divided bar diagram. 
When a total is composed of more than one component the 
relationship among these components can be represented 
graphically in a component-part line chart. This method is 
applicable to time series only. 

Figure 9.8 is a component-part line graph showing the 
financial results of Indian Government Railways since the 
outbreak of the Second World War. The portion below the 
curve representing grots trathe receipts has been divided into 
three components: (a) one showing working expenses, (t) one 
(middle one) showing the amount of depreciation year by year, 
(c) the remaining indicating net earnings (including payments 
to worked lines). Indian Railways earned highest profit in the 
year 1944-45. The low income in 1947-48 might be due to dis¬ 
location and communal frenzy following the partition of India. 



Fig. 9.8 

Semi-Logarithmic or Ratio Scale 

In the foregoing discussion of graphic representation wc 






124 AW INTRODUCTION TO STATISTICAL METHODS 


have used natural-nuni bet scale (or natural scale) according to 
which equal spaces on the vertical axis represent equal absolute 
magnitudes. This method is quite suited to cases where a 
study of absolute changes in a variable is desired. But when 
it is necessary to emphasise the rate of a change in a variable 
or when a comparison is to be made between the relative 
changes of two or more variables and such variables differ one 
from another as regards their respective magnitudes or arc ex¬ 
pressed in different units, natural scale will not stive the 
purpose. 

Study the following ; 

Date Turnover of Frim A Turnover of Firm if 

1 September 100 500 

2 September 150 550 

In the above table we find that the turnover ol the firm 
A as well as of H has increased by Rs. 50. But the increase 
of Rs. 50 in the case of firm A is much more important than 
it is in the case of firm B, In the former case it indicates an 
increase of 50 per cent whereas in the latter case the increase 
indicated is only 10 per cent. Thus if we take into account 
only the absolute increase we will not be able to appreciate 
relative importance of the increase in the turnover of the two 
firms. 

When an analysis ol relative changes is desired, the ratio 
scale 1 or semi-logarithmic scale is adopted. Its distinguishing 
characteristic b the spacing on the F-axis which b scaled 
logarithmically. The arithmetic scale is used on the horizontal 
axis. The term 'semi-logarithmic* is Vised because the vertical 
axis alone is logarithmically scaled. When the horizontal axis 
is also scaled in the same manner, it is referred to as a log-log’ 
or 'double logarithmic’ scale. 

In the diagrams whiJi have so far been considered equal 
vertical spaces represent equal absolute amounts. Thus an 

* According to Wesley t'. Mitchell, die idea of ihc ratio chart was 
introduced by Jfvom in )&ti3-65, gut the ratio chart did not come into 
general usage unfit its merits were explained by Prof Irving Ftshet 
and Janies A. Field. 





GRAPHIC PBESENTATION 


125 


increase from i to 2 occupies the tame distance on the l-ax** 
as an increase from 4 to 5. But this is not so in the case of 
ratio scale. In a ratio scale the length of the interv al between 
two values on the scale rs proportional to the ratio between 
these two values. The distance from 1 to 2 will lie much greater 
than that from 4 to 5 . The former distance will be 4 times the 
distance from 4 to 5, The distance between I and 2 will lie 
the same as that between 2 and 4 or 4 and 8 nr 8 and 16, a* in 
each case the ratio between these pairs is l : 2. 


Ratio Scale 

2 3 456789 



In the above ratio scale, we note that equal distances are 
occupied by the pairs I and 3,3 and 9, 4 and 12. This is 
because the ratio between the two values in each case h l : 3. 
Thus we can say that equal spaces on the ordinary chart 
represent equal absolute difleienre*, while equal spaces in 
the ratio scale represent equal logarithmic differences or equal 
ratios, 

From the above it follows that a scries increasing at a 
constant rate, an equal per cent eac h year, will plot a* a 
straight line on the ratio chart. If, on the other hand, on 
the natural or arithmetic: scale a straight line tendency is 
apparent, the series is growing by constant absolute amounts. 

Shape of the Curve on the two Scales 

When a variable increases by a constant amount the line on 
the arithmetic scale would lie a straight line sloping upward as 
curve A in fig 9.9. When plotted on ratio scale, however, the 
line would be concave to the base and if it were increased far 
enough it would run almost parallel to the ,Taxis as curve B. 
This happens because n series which increase* by equal magni¬ 
tudes is increasing at a diminishing rate. 

When the time series increases at a constant proportional, 
rate, the curve on the ordinary scale would be a line ascending: 
rapidly, i e., convex to the base as curve C in fig. 9,9. If the 



126 AJt nfTBODUCTTOW TO STATISTICAL METHODS 



Fig. 9.9 

tame data are plotted on the semi-logarithmic scale, the curve 
will foe a straight line like curve D in fig. 9 9. In case 
where series diminish by a constant amount their shape 
would be like curves E and F, In curve E (drawn on arith¬ 
metic scale the line is straight sloping down wal'd, In curve F 
(drawn on ratio scale) the line is rapidly declining. This is 
so because as the base value becomes smaller, the constant 
decrease is a progressively greater percentage. 

Construction of a Ratio Scale 

The construction of a ratio scale involves two steps : 

1. Aline is drawn divided into 10 parts of equal dis¬ 
tances and marked 0, 1.2,3, and so on. This will be an 
arithmetic scale, {Fig. 9.10) 

2, The second step is to find out the logarithms of the 
integers l to 10. Take a logarithm <*M equivalent to Ion. 






GRAPHIC FHEStNTATlOft 


J2t 



Fig. 9.10 

Thus *301 is 3*01 cm., 177 is 4 77 cm., and soon. On the other 
line drawn parallel to the line drawn in the first step, distances 
are taken in cm. The logarithm of 1 is 0 P so l will lie written 
at a distance of 0 cm. and w ill be the starting point. The 
logarithm of 2 is *301, so 2 will be written at a distance of 3*01 
cm. from I. The logarithm of 3 is'477, jo 3 will be written 
at a distance of 4.77 cm. from 1, and so on. Thus the total 
distance between the point marked i on the scale and that 
marked 10 is 10 cm, 

in the semi-logarithmic scale, the same marks on the 
scale can be taken to correspond to different groups of numbers 
by multiplying each figure in the original scale by the same 
amount. So the points on the scale corresponding from I to 10 
may also he taken as 10 to 100 or 100 to 1,000 and so on. It 
should, however, be borne in mind that there is no zero on the 
ratio Kale, as log 0 is an indefinitely large negative munficr. 
Cycles m « Ratio Scale 

.The above drawn scale runs from I to 10 or 10 to 100. 



-z erects 


128 *1* INTHODt CTIO* TO STATISTICAL METHODS 



iff WOO 

9BCMO 

90000 

rooao 

60000 

scow 

40000 

30000 


90000 


tom 

9000 

8000 

7000 

M30 

sm 

4000 

3000 


mo 


mo 

900 

wit 

700 

600 

600 

400 

m 


m 


tw 


F ig, 9 ,11 














GRAPHIC PRESENTATION 


129 


The range between I to 10 is known as one ‘cycle' of a ratio 
scale. If the data of the series run from I to 10 or 10 to 100 
or 100 to 1,000, they can be plotted within one cycle of the 
ratio scale. If, however, the data run from 1 to 16 or 10 to 
270, and so on, two-cycle ratio scale is employed. If the data 
vary from say 10 to 9,000, three-cycle ratio is essential Three- 
cycle ratio scale is shown in fig. 9,11. 

Rates of changes may be represented graphically in either of 
the two ways : (1) by plotting the actual amounts themselves 
on a semi-logarithnvic graph, or (2) by plotting the logarithms 
of the actual amounts on a natural scale. 

The semi-logarithmic scale is employer! usefully under the 
follow ing circumstances : 

1. When comparison between series of widely different 
magnitudes is desired. About this James A. Field wrote, 'It 
is tar superior to the natural scale for effecting comparison 
when very small and very large quantities must be taken into 
account concurrently. Whenever a historical curve records 
extreme growth the same advantage is found, It is not neces¬ 
sary to dwarf the small beginnings in order to keep the later 
development within manageable dimensions.' 

2. When comparison between series of different units is 
desired. 

3. When the data are to be examined to see whrther they 
are characterised by a constant rate of change. 

4. When proportionate variations are more important than 
absolute variations 

The merits or the ‘semi-logarithmic’ chart have been sum¬ 
marised by Prof. Irving Fisher 1 as : 

'The eye reads a ratio chart more rapidly than a difference 
chart or a table of figures. 

We may recapitulate w hat most easily catches the eye as 
follow* ; 

(These may be considered to he the rules for interpretation 
of semi log curves) 

1 Qptrted by Smith 

9 



130 AW 1WTROOUCTIOW TO STATISTICAL METHODS 


1. If we tee « curve ascending, and nearly straight, we 
knew that the statistical magnitude it represents is increasing 
at a nearly constant rate. 

2. If the curve is descending, and nearly straight, the 
statistical magnitude is decreasing at a nearly constant rate. 

3. If the curve bendf upward, the rate of growth is 
increasing. 

4. If down, decreasing. 

5. If the direction of the curve in one portion is the same 
as in some other portion it indicates the same percentage rate 
of change in both. 

6. If the curve is steeper in one portion than in another, 
it indicates a more rapid rate of change in the former than in 
the latter. 

7. If two curves on the same ratio chart run parallel they 
represent equal percentage ratei of changes. 

8. If one is steeper than another the first is changing at a 
faster rate than the second. 

Charting Frequency Series 

Wc have had occasion to discuss frequency distributions in 
aft earlier chapter. Here it is intended to explain the method 
of charting such distributions. 

Charting Frequency Distribution of Discrete Type 

The simplest way of representing graphically a discrete 
frequency series is the line or bar frequency diagram. 

The data in table 9.4 are represented in fig. 9,12. In this 


TABLE 9.4 

Frequency Distribution of numbers of Heads which 
appeared when Six Coins were tossed set times 

Ho. ot' hieads frequency 

..'.o.:.:.;. t'/t... " .. 2. 

1 . 10 

2 . 28 

3 . 44 

4 . 30 

3. 13 

8. I 


Total ... 128 









ceafvxc nmmmkTum 13 ! 

Eg. Heavy vertical lines are wed. The kngth of thc*e lines 
indicates the frequency of that me on which the line is drawn* 
Sometimes, solid bars are preferred to fines for data of this 
nature. 



Fig, 9.12 

It should be remembered that ordinates are not to be 
connected in the freqency series of discrete type. It b true 
that a line connecting the various ordinates in such a case as 
this may aid the eye comparing the respective heights of the 
ordinates, hut it does not then establish the distribution. The 
connecting line will merely represent a trend of frequencies at 
the positions at which they occur and will not show the likely 
frequency at every sire or point on the X-axis, as would be the 
case with a line representing a continuous series. 

Charting Frtqieacy Distribution of Conti noons Type 

There art a variety of ways of picturing a frequency dis¬ 
tribution of continuous type, via., the histogram, the frequency 
polygon, the smoothed frequency curve and the ogive curve. 



132 AN INTRODUCTION TO STATISTICAL METHODS 


Hialdgram 1 

The term histogram must not be confused with the term 
•hittorigram* which stands for time charts. Probably the 
histogram or column diagram is the best way of presenting 
graphically a simple frequency distribution. It is constructed 
by erecting upon the class interval columns m rectangles whose 
heights are proportional to their frequencies. 


Table 9.5 below gives the distribution of 
students of a college 

TABLE 9.5 

Weight In Pounds) 

weight of 1,515 


Si/e 


Frequency 

91 

t<. 100 • 


5 

101 

VO 110 . 


34 

111 

to 120 * 


130 

KM 

to no . 


300 

151 

to no . 


307 

HI 

to 1 W . 


319 

mi 

to 160 . 


205 

mi 

to 17b , 


70 

171 

u\ mo . 


1 i 

1 HI 

to 190 . 


16 

MM 

Vt 200 . 


3 

201 

to 210 . 


4 

21) 

to 220 . 


3 

221 

to 230 


l 



Total . . 

. 1,515 


To prepare a histogram from the above data, the following 
steps should hr taker; ; 

I 1 he fourteen rla-o intervals of 10 i ouvuis each are laid 


t The (fftn 'ItiUitq.innv* w:h iir.i u.mt by K.vI JV 415011 in 139 a a* ,i trrrn 
fnr ». co'icmon term of rrpi**#-iii:iu?>n, i.*\, ‘by rdtrmm «v»Sii• King 

«t area* the fieqacney t re^mndrn^ to the t angr of their l>a*r\ The 
term i\ not i-rkttnl in ct >‘rrrm n> ‘history-* u« it srunethne* ^ min mush* 









GRAPHIC PRESENTATION 


133 


off to a convenient scale on a horizontal axis, the value increas¬ 
ing from left to right. The end point oi' these intervals should 
correspond to the real limits of the class. It means that the 
marking on the scale must show the numerical values of the 
class boundaries or real class limits. In the above data the 
real class limits are : 


90-5 to 1U0-5 
100 - 5 to 110 5 
110-5 to 120 .5 
120-5 to 130-5 
130-5 to 140-5 
140 5 to 15t'-j 
150-5 to 100'5 


100 5 to 170-5 
170 5 to 1U0 5 
I SO-3 to 190-5 
190 5 to 200-5 
200-5 to 210 5 
210’5 to 220*5 
220-5 to 230-5 



Fig. 9.13 

2. The neat step would be to erect a scale of frequency 
at right-angles to the scale of size or .V-axis. It is not essential 
to show the zero point on the horizontal axis, but it is 







134 AH IHTHODUCTIOH TO STATISTICAL METHODS 

necessary to show it on the vertical axis or on the scale of 
frequency. 

3* The third step would be to erect the bars. The length 
of the bar would represent the frequency of that particular class 
interval. 

The data of table 9.5 are illustrated graphically in fig. 9.IS. 
Thus the histogram consists of a set of adjacent rectangles or 
ban the bases of which equal the true class width or class 
interval and the altitudes of which equal the corresponding 
class frequencies. 

Frequency Polygon 

If we join the middle points of the tops of the adjacent 
rectangles of the histogram (fig. 9.13) with line segments, as 
indicated in fig. 9.14, a frequency polygon is obtained. When 
the polygon is continued to the A'* ax is just outside the range 



Fig. 9.14 







GRAPHIC PRESENTATION 


135 


of lengths, as in the figure, the total area under polygon will 
be equal to the total area under histogram. Note that m 

fig. 9.14 the triangles l, 2, 3,.14 are congruent respectively 

to triangles 1\2\3\.14V 

It is not essentia! first to draw histogram in order to obtain 
frequency polygon. It can be drawn without erecting reef* 
angles as in fig. 9.15. 



The frequency polygon h constructed as follows: 

1. The scale should be marked in the numerical values 
of the mid-points of intervals. 

2. Erect ordinates on the mid-points of the intervals*** 
the length or altitude of an ordinate representing the fre¬ 
quency of the class on whose mid-point it it erected. 

Because vertical angles are equal, fight angles ate equal, tides which are 

halves of tops of rectangles, are equal. 





136 AW INTRODUCTION TO STATISTICAL METHODS 


% Join the topi of the ordinates and extend the connecting 
line to the scale of sixes. 

Smooth Frequency Curve 

Table 9.5 shows the distribution of weights of 1,315 students 
of a college, arranged according to class intervals of 10 pounds 
each and figs. 9.13 and 9.14 exhibit the corresponding histo¬ 
gram and polygon. The weights of the students range from 
90.5 pounds to 230’5 pounds If wc could make our class 
intervals smaller and smaller, the columns in fig. 9.13 would 
become narrower and narrower. Likewise the number of cases 
in the classes would become smaller and smaller. If the class 
interval is reduced to 5 pounds (from 10 pounds) fig. 9.14 will 
take the drape of hg. 9.16. 



Fig. 9.16 

In reducing the class interval, it might happen that a 
particular bar might become very small in length or wen might 
disappear. But if we have been able to record the weight of a 
large number of cases, we could make the bars narrower with¬ 
out making them disappear altogether. When changes, both 
in the clan interval and uumber of students considered, take 
place simultaneously (the number of cases being made very 





OKAPUlC PRESENTATION 


137 


large and the width of class interval very small), the frequency 
polygon takes more and more an appearance of a smooth 
curve, Thus the effect of increasing total frequency and 
decreasing the class width would have been to ‘smooth out* the 
frequency polygon and to make it look like a curve. If we 
would study the weights of 3,030 students and group the results 
into class intervals of 5 pounds, the table that may he so 
obtained is represented graphically in fig. 9.17. 



Fig. 9.17 

If for want of adequate information it b not possible to 
study larger number of cases and smaller size of dais intervals. 





138 AH IimtODUCTIO!* to statisticaju methods 


we can draw a frequency curve free hand, Bui smoothing the 
polygon free hand requires some experience and intelligence 
which, if not applied, would lead to opposite results. While 
smoothing a frequency curve care should be taken to see that 
the area enclosed by the curve is neither more nor less than the 
area of the rectangles of the histogram. 

Ofiw Curve or Cumulative Frequency Curve 

So far we have been discussing the charting of simple 
frequency distributions where each frequency refers to the 
measurement or the class interval against which it is placed. 
Sometimes it may become necessary to know the number of 
items whose values are more or less than a certain amount. We 
may, for example, be interested in knowing the number of 
students whose weight is less than 140*5 lb. or more than (say) 
150*5 lb. 

To get this information it is necessary to change thefonn of 
the frequency distribution from a ‘simple’ to a ‘cumulative’ 
distribution. In a cumulative frequency distribution the fre¬ 
quency of each class is made to include the frequencies of all the 
lower of all the tipper classes depending upon the manner in 
which cumulation is done. The manner of cumulation, in its 
turn, depends upon our purpose. If we arc interested to know 
the number of items that are less than* a certain size the 
cumulation will proceed from the least to the greatest size; and 
the series so obtained will be called Mess than* cumulative 
frequency distribution. When it is desired to know the number 
of items whose sizes are 'more than 1 a certain size, cumulation 
will proceed from the greatest to the least, and the series so 
obtained will be called 'more than* cumulative frequency 
distribution. 

Construction of a 'Less Than Cumulative Frequency Table . If 
we look carefully at the lower end of table 9.5 we will find that 
5 students have a weight of 100.5 lb. or less (considering the 
real limits of the class interval); 34 have between 100*5 lb. and 
110*5 lb. The total number of students whose weight is 110*5 lb. 
or les« ts, therefore, 34+5, or 39. The number of students 



GftAPHlC PHESENTATlOff 139 

between U0'5 lb* and 120*5 lb* is 139* Adding this to the 
accumulated frequency of the two classes (139+344*5 *» 179) 
we hod that 178 students have a weight of 120*5 lb. or less. 
When the frequencies of each age group are added in this way 
the distribution given in column 2 of table 9,6 is secured. With 
the help of this distribution it is possible to find out the number 
of students whose weight is less than a certain amount. Thus, 
there are 1.507 students whose weight is less than 200*5 lb 

TABLE 9.6 

DiatrlbatlM of Weight of 15x5 Student* 
of a College in Pound* 


Claw Interval 

Less than 100*5 pounds 

Cumulative 

Frequency 

(less than) 

5 


M 

110*5 >t 

39 

»* 

M 

120 5 „ 

178 

M 


130 5 t . 

478 

», 

• » 

140*5 M 

845 

M 

»» 

150 5 

1164 

»! 

M 

160*5 PB 

1369 

it 

» 

170*5 „ 

1445 

f* 

ft 

180 5 „ 

1488 

f » 

1) 

190*5 1 , 

1504 

M 

ft 

200*5 „ 

1507 

» 

ft 

210*5 „ 

1511 

ft 

ft 

220*5 „ 

1514 

ft 

It 

230*5 M 

1515 


Construction of a 'A/or* Than* Cumulative Frequency Table . 
When a ‘more than* cumulative frequency table is to be 
constructed we start at the upper end of the distribution. In 
table 9-5 we find that there is one student whose weight is 
220*5 lb, or more; and 3 whose weight is between 210 5 lb. and 
220*5 lb. Thus the total number of students whose weight is 
2 ID’S lb, or more is 3+1 ~ 4. The frequency of the neat tower 
class is 4. Adding this to the cumulated frequencies of the two 





140 an ir*twoi>t*cnioN to statistical methods 


higher ctos wc get 4 4 3 1 -6, the number of students whose 

weights are 200’5 lb. or more. 

When the frequencies of each group are cumulated in this 
way we get table 9.7. From this tabic we can find the number 
oS students whose weight is more than a certain amount. Thus 
there art 146 student# whose weight is more titan 160 5 lb. 

TABLE 9.7 


Distribution of Weight of 1515 Students 
of a College in Pound* 


Class Width 

Cumulativc Frequcncy 
(more than) 

More than 90 5 pounds 

151;* 


» WO-5 

1510 

M 

110-5 

1476 

*1 

120 5 „ 

1337 

ft 

„ 130 5 „ 

1037 

t* 

„ no r. 

070 


f » 1 ^ 1 1 

35! 

f | 

„ 160-5 „ 

146 


170 5 „ 

70 

1» 

ISO’S „ 

27 

ft 

190-5 „ 

11 

II 

„ 200-5 

6 

n 

„ 210 5 ,, 

4 

»♦ 

„ 2205 

1 

ii 

230-5 .. 

0 


It must always be remembered that the frequencies of a 
*le» than 1 cumulative frequency table refer to the upper limit 
of the class intervals and those of the ‘more than* cumulative 
frequency tabic refer to the lower limit. 

Charting Cumulative Frequency Distributions 

In charting cumulative frequency distributions the following 
steps are necessary : 

(1) Scale the cumulative frequencies along the F-axis, and 
dais intervals along the A-axis. 





GRAPHIC PRESENTATION 141 

(2) The scale along T *axis should he such that it may 
accommodate the total frequency. 

(3) The accumulated frequencies for each class arc plotted— 
(a) against the upper limit of the class in the case of ‘less 

than* cumulative frequency distributions (sec fig. 9.18)* 
and 

(J>) against the lower limit of the class in the case of 
‘more than' cumulative frequency distributions (sec 
fig. 9.19). 





142 AN INTRODUCTION TO STATISTICAL METHODS 


The data of tables 9.6 and 9.7 are used in the figures 9.18 
and 9.19. 



Fig. 9.19 

Fig. 9 20 bring* out the relationship that exists between a 
simple frequency and a cumulative frequency distribution. 

Cumulative Percentage Curve 

Sometimes the cumulative frequencies are converted into 
cumulative percentages, by dividing each cumulative frequency 







Wig. 9 . 20 











144 AN INTRODUCTION TO STATISTICAL METHODS 


by the total of frequencies and multiplying the quotient by 100. 
On the vertical scale, cumulative percentages are marked 
rather than cumulative frequencies, 'The cumulative percent* 
age curve is generally employed when a comparison between 
two or more distributions is desired by plotting them on the 
same pair of axes This can be illustrated by a simple example. 
Table 9.8 jpves the weights of the children of two sections of a 
middle school class. 


TABLE 9ft 

Frequency Distribution 

(Showing wrights of the children of two section* of a 
middle school class'; 


Weights 
in seers. 

; Section A 1 50 students) 
Fre. Cum. Fre. Cum. 

Section 15 
Fre (aim. 

(7ft students) 
Fre, Cum. n [, 

30—33 

2 2 

1 

10 

10 

12 n 

34 . 37 

2 4 

ft 

lo 

20 

25 6 

36--41 

14 if; 

36 

16 

36 

46*2 

42-45 

6 24 

■18 

10 

46 

39 0 

46 19 | 

2 26 

52 ! 

10 

56 

710 

50--53 

ft :U 

6ft | 

| ft 

64 

82*1 

54-57 

10 14 


10 

74 

94*9 

5ft- 61 

\ 4ft 

% | 

2 

76 

97*4 

62-63 

4ft 

'16 

i 

76 

97 4 

' M- . 69 

2 50 

100 

2 

7ft 

100*0 


50 


7ft 







GRAPHIC PRESENTATION 


MS 



Fig. 9 21 

In fig, 9.21, the two ogive curve* are shown on the same 
pair of axel, using the cumulative frequency dittrihution. In 
fig. 9,22 are plotted the cumulative percentage ciistrihutkmi. 

10 




CUMULA 


146 AH IHTHODirCTIOPt TO STATISTICAL METHODS 



ijinN.vijSBjNsi) 

«Sp<5t0**7‘ t O<Q<o1O 


WEIGHTS 


Fit;. 9.22 


6S'S 






GRAPHIC MESKHTATI0H 


147 


EXERCISES 

1. What points must be borne in mind white drawing curvet ? 

2. Distinguish between Natural Scale and Ratio Scale. In which cates 
should the Utter scale be used ? 

3. What is a false base tine ? Under what conditions would its use be 
desirable ? 

4. What do you mean by : 

i a) frequency curvet, 
fb) bivariate charts, 

K) curves picturing time teries ? 
y Represent the following data graphically : 

Tndea Numbers of Wholesale Prices 


[Base (1939 August) *» 100) 



Cereals 

Pulses 

Fibres 

ODtccd* 

1948 (Average) 

443 

424 

432 

499 

1949 

465 

438 

446 

593 

1950 

471 

449 

476 

665 

1951 

483 

506 

622 

679 

1952 

450 

483 

454 

484 

1953 

451 

494 

420 

573 


fi. The fallowing table gives the value of imports and of exports of India 
for the years 1953-34 and 1954-55 in crores of rupees ; 


1953-54 1954-53 


Months 

Imports 

Raports 

Imports 

Exports 

Jan. 

22 

28 

26 

18 

Feb. 

24 

28 

21 

20 

Mar. 

76 

23 

19 

17 

Apr. 

28 

21 

16 

17 

May 

3! 

20 

21 

20 

June 

29 

22 

20 

20 

July 

32 

21 

23 

18 

Aug. 

33 

19 

76 

20 

Sept. 

32 

20 

21 

n 

Oct. 

31 

19 

78 

23 

Nov. 

25 

18 

70 

72 

Dec, 

24 

19 

21 

28 


PkM the above figures on a graph paper, and show also the balance 
of trade by mean* of a curve. 

T. The following table giws the proportions of married women in 1950 
and 1951 from women of every age. Shove graphically that the increase 
was mo** mar feed for the women of younger years. 

Percentage of Married Women imaginary data' 


A** 


1950 

1951 

18 


. -~rr$ 

7fiT 

20 


562 

384 

22 


50 7 

529 

24 


620 

61* 

26 


65*7 

67*8 








14$ AN ntTKOmiCTfON TO mntTICAL METHODS 


I. Draw a graph from the following data i 

Anklet of Ewport from India 


(la lakfet of rupee*) 


Article* 

1936 

1945 

1944 

1945 

Cotton raw ft watte 

2 m 

m 

901 

1009 

Cotton yarn ft nvfrt. 

m 

4291 

4000 

3213 

Jute raw 

m4 

737 

795 

1263 

Jute mfrs. 

2618 

4443 

6059 

5618 

Tea 

2547 

3544 

4065 

3659 

Hide* ft skint 

1158 

1385 

1404 

2141 

Oi heeds 

1594 

102(5 

1125 

1371 


ft Plot the following figure* relating to population of India so a* to «how 
the proportionate increase in population from one period to another. 


Year 


Population (000^000# omitted) 


1672 

210 

1661 

250 

1091 

290 

1901 

295 

1911 

315 

1921 

320 

1931 

350 

19H 

390 






Chapter 10 
Measures of Central Tendency 


I t has been pointed out earlier that for a proper under¬ 
standing of the quantitative data they should be classified 
and converted into a frequency distribution. This process of 
condensation reduces their bulk and gives prominence to the 
underlying structure of the data. But classification is only 
the first step in statistical analysis. If the characteristics of 
given data arc to be properly revealed or if one distribution 
is to be compared with another, it is necessary that the 
frequency distribution itself roust be summarised and condensed 
in such a manner that its essence is expressed in a few figures 
only. 

For a proper appreciation of the methods that arc em¬ 
ployed to summarise frequency distribution it is necessary to 
carefully note that most of such distributions have certain 
common characteristics. 

In the first place the size of the variable varies from item 
to item. Heights as well as weights vary as between indivi¬ 
duals. Gold prices vary from day to day and week to week. 
In a given case, however, all values are distributed along a 
scale and lie between two extreme values. Between these 
two extreme limits in most of the cases the item* arc distri¬ 
buted in such a manner that if we move from the lowest 
value to the highest value the number of items at each 
successive stage on the scale of values increases with a certain 
amount of regularity till wc reach a maximum ; and then as 
we proceed further the number of items decrease* almost with 
the same regularity at each stage. Thus on the scale of values 
there it a point at which largest number of the items tend to 
cluster. This it an important characteristic of frequency 
distributions. 




ISO a*r mraooucriow to statisticaXi methods 

It follows from the above that as we move from the point 
of greatest concentration towards one or the other catenae 
on the scale of values, the number of items will go on 
decreasing at each successive stage. This means that the 
greater the difference between the poiot of maximum frequen¬ 
cy and any other point on the scale of values the less will be 
the number of observations and vict msa. 

The above discussion of the general characteristics of 
frequency distributions suggests certain clues for selecting a 
value which may best describe them. We have seen that 
on the scale of values there is a point where the concentra¬ 
tion is greatest—-or a value which occurs the greatest num¬ 
ber of times. This value constitutes a measure of the central 
tendency of a given distribution and may be regarded as the 
most representative of the series under investigation. This 
value, it must be noted, is only one of the several ways of 
measuring the central tendency. All such measures are 
commonly termed ‘averages*. 

Simply stated, an ‘average* of a statistical scries is a 
tingle value of the variable which is a satisfactory representa¬ 
tive, for the object in view, of the distribution. It may be 
defined as ‘'a short expression of the phenomenon which levels 
all differences of the series*’. 

Types of Averages 

Different methods of measuring ‘central tendency* provide 
us with different kinds of averages. The following are the 
main types and averages that are commonly used ; 

1. Mean (Arithmetic Mean), 

2. Median, 

3. Mode, 

4. Geometric Mean, and 

5. Harmonic Mean. 

Since each one of the above averages has its own indivi¬ 
dual characteristics and properties, a decision must always 
be made as to which average would be most useful in view 
of the nature of the statistical data and the purpose 



MEASURES OF CENTRAL TENDENCY 


1S1 


of the inquiry. A proper appreciation of their individual 
properties would be possible only after a careful study of the 
methods of computing each such average. 

Arithmetic Mean 

The arithmetic mean is so much in everyday use that nearly 
all of us are familiar with the concept. The arithmetic mean 
of a series of items is obtained by adding the values of ail the 
items and dividing the total by the number of items. We can 
symbolise this computation as ; 

. + Xn or V £x 

jV w x ~ Jf * 

where x lt x tf x a . Xn are the given values, 

.V s ® Number of items 
£ * means ‘the sum of* 

X** represents 4 the mean of x'* 

The formula should be read as : “The Arithmetic Mean 
of the x*a is the sum of the jr’s divided by the number of items.** 
Thus the mean of measures 3, 5, 7, 9 is equal to the sum of 
these measures divided by 4 (number of measures) which is 
24/4 or 6. 

Computing Arithmetic Mean x Individual Observation* 

Example : Find the mean of the jo timing observations : 


Individual A 

has an annual 

income 

of R*. 

1200 

ft 

B 


• » 

*> 

tt 

tt 

Rs 

1500 

tf 

C 

»» 

»f 

at 

*t 

ft 

Rs. 

1800 

t| 

D 

it 

ft 

at 

tt 

tt 

Rs. 

2000 

ft 

E 

if 

ft 


it 

• » 

Rs. 

2500 

tf 

F 

*» 

>1 

tt 

»» 

ft 

Rs. 

3000 

M 

G 

>» 

1* 


t» 

’ t 

Rs 

22*h> 

»* 

H 

»» 

II 

*t 

ft 

l» 

Rs 

3500 


1 

it 

*• 

M 

if 

It 

Rs. 

3700 

ft 

J 

M 

at 

tt 

*f 

ft 

Rs 

2900 

Total 

10 






R» 24,300 






1S2 AN INTRODUCTION TO STATISTICAL METHODS 
Authmeiu Amttgt 

X-i«2L- R.-2430 

ffvV i* a large number and no adding machine is avail¬ 
able, the simple operation of straight-forward addition may 
be very laborious- It may, however, l>e made simpler by em¬ 
ploying a provisional mean (or assumed mean). A glance 
at the above data shows that the mean is somewhere between 
2,000 arid 3,000. We select R». 2,500 as provisional mean 
and add the positive and negative deviations from it. These 
deviations will hr : 


Individual 

Income 

Rs, 

' Actual sue- Assumed mean) or 
Deviations from assumed (2500j 

A 

1 200 

— 1300 

B 

1500 

-1000 

C 

1800 

- 700 

D 

2000 

- 500 

E 

2500 

0 

F 

3000 

-f 500 

O 

2200 

— 300 

H 

3500 

r 1000 

l 

3700 

+ 1200 

J 

2900 

■f 400 



Toul — 700 


The sum of the deviations is —700. If wc divide it by *Y 
(10), the result is-700/10-*-70. This means that the mean 
income deviates from 2500 by — 7o and hence the mean is 
R»> 2500 + (-7CI)«Ri. 2430. 

The method of computing the Mean explained above can 
be stated as : 

£*' 





mcab^bks or central rmmnm 


153 


uhtrt X is (he variable under study 

X is the aetual mean of the distri¬ 
bution 

A is the arbitrary mean 

x is (X—A) i e deviation of an? 
single value from the arbitrary 
mean 

Ex' is the sum of deviations from 
the arbitrary mean 

X r is number of items. 

This method of compution is known a« ‘short cut method** 
It is based on the fact that the algebraic sum of the deviations 
of individual values from the mean is zero. 

Observe the following illustration : 



-V 

X* 

X 

(deviation 

(deviation from 

(valuas) 

from mean) 

assumed mean) 


6 

7 

:* 

—3 

—4 

5 

— 1 

—2 

7 

^■i 

0 

9 

d-3 

+ 2 


lY^O 

r*'- —4 


It is evident from the above that the sum of the deviations 
from mean 6 is zero. But when deviations are computed 
from 7 (assumed mean) their sum is -4. 

This means a deviation computed from arbitrary mean 

_4 

on an average -■-{■*.—1) more than that computed from 


actual mean* i.e., actual mean i* —l more than arbitrary mean* 
Hence in order to find the actual mean we must add this 
difference to the arbitrary mean. 






154 AN INTItODUCTION TO STATISTICAL METHODS 

Thu* + 

Chat it X m -7-f-(—1) 

M» () 

Geminating Arithmetic Mean : Discrete Frequency 
Distribution 

A frequency distribution table shows how often each value 


of the variable occur* in the sample under consideration. The 
table below shows the marks obtained by 4b students : 

TABLE 10.1 

Frequency of Mark* of 46 Students 

Marks 

Frequency 

X 

/ 

9 

I 

10 

2 

11 

3 

12 

6 

13 

10 

14 

n 

15 

7 

16 

3 

17 

2 

18 

1 


Total 46 


In che above table the first column, headed A\ show* the 
different values assumed by the variable A’ in the observations, 
and second column, headed /, gives the number of times of 
occurrence, or frequency, of each value. The total of the 
second column is of course the total number of observations, 
le* f $ (f) ot 






MEASURES OF CENTRAL TENDENCY 


155 


The next step h to make a third column, headed fX 9 by 
multiplying together the corresponding pairs of numbers in 
the X and / columns. We then sum the entries of the fX 
column, getting XfX. The mean is then obtained by dividing 

the sum by N as before, i.e., 

Example : Find the mean of the data of Table 10.1 


X 

/ 

/X 

9 

1 

9 

10 

2 

20 

It 

3 

33 

12 

6 

72 

13 

10 

130 

14 

11 

154 

15 

7 

105 

16 

3 

48 

17 

2 

34 

18 

1 

18 


46 

623 


Zfi V«623 

46 

Tf 623 »o r j 4 

^ 13 *54 

The short cut method of computing the mean may be 
employed here also, to avoid long and l&fwrious calculations. 
First the deviation of each value from the assumed mean should 
be found and multiplied by its corresponding frequency. 
Then the algebraic sum of the products is divided by the total 
frequency A'. The quotient thus obtained added to the 
assumed mean gives the desired mean. The following example 
shows the working of the short cut method. 






156 a » utTBooucrioN to statistical methods 

Example i Find iht man ef ifa data giotn in tail* 10.1 tj> tfw tftart 
mi mlhod. 


deviation from 
assumed mean (13) 




x' 

/*■ 

9 

I 

--■4 

—4 

10 ■ 

2 


—6 

n 

3 

—•2 

.-6 

12 

6 

.1 

—6 

13 

io 

0 

-22* 

14 

li 

■t \ 

f 11 

15 

7 

2 

h 14 

16 

:t 

t-3 

1 9 

17 

2 

■; - 4 

-r 6 

ttt 

1 

•i-5 

5 


46 


47 

_22 


25 

* Since there will he no entry in the jx column correspond* 
ing to ar**0, this is a convenient place to write the sum (here 
- 22) of the negative entries in the Jx column. The sum of the 
positive products in the fx column, namely 47, is written in the 
same line as the total A, The final sum 25 is then easily 
obtained. 

r/s',,25 
-Va* 46 


X- A 


zj* 

;-v 


■' as J 3 4* 


25 

46" 


13*54. 






MEASUBE9 OF CETSTRAL TEHDRWCY 4 IS? 

Arithmetic Meta frtm Ctatittoti Frequency 
Diitribvtita : Lotg Method 

In computing the arithmetic mean from the grouped 
data, we take the mid-value of each class as representative 
of that class, multiply the various mid-values by their corres¬ 
ponding frequencies, total these products, and divide by the 

total number of items. Symbolically if m„ m t , ro 3 ,..represent 

the mid-value and f t$ /*,/*♦.-the frequencies, then 

V iL m3 ' ‘ ■/** 

A .T»+A+7s+-” /•.~ 

XT 


The mid-value 1 of a class is obtained by adding the upper 
and lower limits of the class and dividing by two. For every 
frequency distribution we must consider carefully what these 
limits are. When a class is (foe example) ‘32 00—33’99\ the 
mid-value is 33*00 since the relative discrepancy is small. In 
determining the mid-value for a frequency distribution, it is 
probably best to assume that figures were rounded to the 
nearest unit given. For example, if a one-inch class is written 
‘12 0—12*9 inches*, consider the limits as 11*95 and 12*95 
inches ; if a five-pound class is written '10—14* pounds, consi¬ 
der the limits as 9 5 and 14 5. However, for discrete data a 
ten-mark class *70—79* has the limits 70 and 79. 

Example : Computation of the Mean from data relating to monthly 
earning..1 of workers. 

Considering the mid-value for the monthly earnings of the 
workers as discussed above, the mean will be computed as : 


* See Chapter tint Mi- 







IS8 an jmn&mcnon to statistical methods 

TABLE 10.2 

Distribution of Male Workers by Average 
Monthly Earnings 


The computation of the Arithmetic Mean . Long Method 


Monthly Earnings 

Rs« 

Mitbpoint 

Hs 

m 

Number of 
workers 

/ 

R*. 

fm 

27-5—32-5 

30 

120 

3,600 

32*5—37*5 

35 

152 

5.320 

37*5—42*5 

40 

170 

6,800 

42'5—47*5 

45 

214 

9,630 

475—525 

50 

410 

20,500 

52 5—57 5 

55 

429 

23,595 

575—62*5 

60 

568 

34,080 

62-5— 67 5 

65 

650 

42,250 

67 5-72 5 

70 

795 

55,650 

72 5—77-5 

75 

915 

68,625 

77-5—62-5 

80 

745 

59,600 

82-5—87-5 

85 

530 

45,050 

87-5—925 

90 

259 

.23,310 

92-5-97-5 

95 

152 

14,440 

97-5-102-5 

100 

107 

10,700 

102-5-107 5 

105 

50 

5,250 

1075-112 5 

no 

25 

2,754) 



6.291 

4,31,150 


¥ ~ 


4,31.150 

6,21m 

k»Ri 68*5 

If we compute the arithmetic mean from unclassified data 
it may differ slightly from Rs. 68*5. This lack of agreement is 
present due to the inadequacy of the mid-value assumptions. 
It k almost always true that none of the m kb value is actually 






KEAStTREB OF CENTRAL TENDENCY 


159 


the true concentration point of this class. But in the case of 
symmetrical distributions there is greater possibility of errors 
compensating, some of the mid-points erring by being too low 
and others erring by being too high. However, if the frequency 
tails off towards either the high or low values, i.c., if it departs 
seriously from a symmetrical distribution, the arithmetic average 
computed will lie somewhat in error because of the failure of 
the known errors in the mid-point assumption to compensate. 

Arithmetic Mean from Grouped Data ; Short Method 

In the case of individual observations, we could take a value 
as assumed mean and making use of the fact that the sum of 
deviations of all individual items from the true average equals 
zero, compute the necessary correction to obtain the true arith¬ 
metic average. This method can also be used for grouped data 
and will save appreciable time in computing a mean for a 
frequency distribution. 

In order to calculate the mean of the observation* in grouped 
data we take an arbitrary origin, and calculate the discrepancy' 
between this point and the true mean. The arbitrary origin 
shot)Id preferably be the mid-point of a class near the centre of 
the distribution. From this assumed mean {arbitrary origin), 
deviation, in terms of class interval;* of the items in, each class 
should lie found out. Thu* for the items in the class containing 
the assumed mean the deviation will be zero -f 1 for the items 
in the next higher class, 1 for the next lower class and so on. 
Thus we will have a new column x showing these deviations. 
The deviation of each clast is multiplied by the frequency of 
that class, taking account of algebraic signs, These products 
give us another column jx . The algebraic sum of the numbers 
entered in this column, £fx\ is obtained and divided by the 
number of items JY to give us the discrepancy D between the 
true mean and the arbitrary origin in dais interval unit, Le., 



• or in terra* of any other factor if ri*«s»ietfTvab are not equal. 



160 ah iirruoDucnoN to statistical methods 

Its order to find the true mean, D should be multiplied by 
the length of the class interval and the product added to the 
assumed mean. Thus, 

if t is the sire of the class interval, and A is the summed mean, 

X«* A 4 D x i 

TABLE 10.3 

Dlstritwtisa of Male Worker® by Average 
Monthly Earnings 


(Computation of arithmetic mean -short-cut method with class 
interval units) 


Monthly 

Mid* 

No. of 

Deviation 

in 

Earnings 

point 

workers 

interval units 

JU 

Rs, 

/ 

X* 

/*’ 

27-5—32-5 

30 

120 

—7 

—840 

32*5—37*5 

35 

152 

—6 

—912 

37*5—42*5 

40 

170 

—5 

—850 

42-5-47*5 

45 

214 

—4 

—856 

47*5-52*5 

50 

410 

—3 

—1230 

52-5-57-5 

55 

429 

-2 

—858 

57-5-62-5 

6<> 

368 


—568 

62.5—67*5 

05 

650 

0 

—6,114 

67-5-72-5 5 

70 

765 

1 

795 

72-5-77-5 

75 

915 

2 

1830 

77-5—82-5 

80 

745 

3 

2235 

82-5-87 5 

85 

530 

4 

2120 

87-5-92 5 

90 

259 

5 

1295 

92.5-97-5 

95 

152 

(> 

912 

97-5-102-5 

100 

107 

7 

749 

102-5 107-5 

105 

50 

8 

400 

1075- U2-5 

no 

25 

9 

223 





10,561 





-£,114 



6 >291 


4,447 



MEASURES OF CENTRA!, TENDENCY 

i//-4W7 ;„5 

-V—6291 


D "x 


4447 

629 f 


X AiD i 


■ 65 f 


4147 


x 5 


« 68*53 


161 


Computation of mean by use of step interval units 
with origin at the mid point of lowest interval 

TABU- 10.4 


*Si.:e 

J 

x’ 

/A* 

27*5— 32 5 

120 

0 

0 

32-5- 37-5 

152 

1 

52 

37*5— 42*5 

170 

2 

HO 

42'5— 47-5 

214 

3 

642 

47*5- 52*5 

410 

4 

1840 

52-5- 57*5 

420 

5 

2145 

57 5— fi2-5 

568 

6 

3408 

62*5— 67 5 

650 

7 

4550 

€7-5— 725 

795 

8 

6360 

72 5— 77 5 

915 

9 

8235 

77-5- 82-5 

745 

10 

7450 

82'5— 87 - fi 

530 

11 

5830 

87-5- 92-5 

259 

12 

3108 

92 5— 97 5 

152 

13 

1976 

97 5-102-5 

107 

14 

1498 

102-5-107 5 

50 

15 

750 

107-5—112-5 

25 

16 

400 


6,291 


48,484 




162 An hithodvctioh to statistical methods 


T» an , 48484 w , 

y -30 4 -^- 91 AJ 


242420 


30 + 38 53 
88 53 

If, however* the intervals are unequal the formula should 

fvy ' 

be: X--A f . j*- where V represents actual deviation from 

assumed mean. 

Properties of the Arithmetic Mean 

{1} The sum of the deviations from mean is equal to zero, 
Proof ; 

Let the variable x have n various value# x v x tt x t ...xn then 
by definition, 


mean, 

X ■ 

*1 * *»■• 

,v 

< in 

or 

VA 

«■*,+*»+■*». 


or 

0 

' *t f T I r J- 


or 

0 

■' 1 * j X) 4- or,- ’ 

.D. \xn — X 


Hence the statement that the sum of deviations from mean 

it mo. 

(2) The combined mean of the two series n equal to the 
•urn nf the product of the number of items in each series with 
their respective means divided by the total number of items in 
two series, i.r., 

' V * '.V t -y A* 

(3) Sum of the squared deviations is less when the devia¬ 
tions are taken from mean, X than when they are taken from 
any number A other than \\ i.t>. 

£ {Xi - X? <£ {Xi-A}* 

(4) The mean of all the sums (or differences) of corres¬ 
ponding items in two series, number of items being equal in 




MSA901KE8 OF OS*THAI TEWOIHCT 163 

two, is equal to the sum (or difference) of the meins of the two 
series Let AT** X t ±Xf 

EX^SXt ±2X t 

Divide by the total no, of items in each series, 

x*~x t ±x t 

Nature tad Sffalitaace of A/ithnetlc Meta 

The Arithmetic Mean is aftected by the value of each item 
since it is the sum of the values of ill items that is used in 
its determination. It, however, gives undue weight to extreme 
values. For example, the earnings of a few highly paid 
workers in a factory would cause the arithmetic mean to 
lx somewhat higher than it would be without those earnings. 
But for these highly paid workers the wage conditions in this 
factory may be approximately the same as in some other 
factory, yet the arithmetic mean of earnings here would be 
higher because of the inclusion of extreme values. 

The arithmetic mean, very often, is not the value of any 
item included in its computation. In a discrete series, it may 
be a value which actually does not exist. The average marks 
obtained by a student from the data given in table 10 1 is 13'54* 
Now marks are not awarded in fractions. These characteristics, 
however, are not serious disadvantages. In certain cases it is 
necessary' to include extreme values. When averaging sale* 
effected over a period of time it is necessary to include the 
sales made on each day of this period irrespective of their sire 
because average daily sale* are compared w-ith average selling 
expenses. Furthermore, even though the above stated average 
marks were not obtained by any student, it is useful in 
comparing standards in two schools. 

The unequal and open-end classes affect the arithmetic 
mean considerably. Because of the skewness, and accentuated 
by the uneven classes, the offsetting of errors in the mid-value 
assumptions is not fully realized. This results in a value of 
mean for the frequency distribution which may differ appreci¬ 
ably from the value of arithmetic mean of the raw* data. Also 



164 AW INTRODUCTION TO STATISTICAL METHODS 

Aft m the open end claties, mid value* are nor apparent, the 
value of Tt cannot be accurately determined. 

The arithmetic mean has a definite advantage in that it is 
commonly used and, therefore, familiar to everyone. It dors 
not need the organisation of data. It is a more reliable estimate 
of central tendency in care of population t han the median or 
the mixle for a wide variety of population, f urthermore it 
can be located by mathematical exactness and is most amen¬ 
able to algebraic treatment. 

The Median 

The second measure of central tendency, that has a wide 
usage in $ tat istScal works, is the median, 1 Median is that v<t!nr 
of a variable which divides the series in such a manner that the 
number of item* below it is equal to the number of items above 
Half the total number of item* lies below the median, and 
half above it The median is thus a positional average. 

The Median from tin grouped Data 

The median of ungrouped data is found easily if the items 
are first arranged in order of magnitude. The median may 
then he located limply by counting, and its value can be 
obtained by reading the value of thr middle item If we have 
five items whose values are 8, 10, 1,3 and 5, the values are 
fir*t arrayed ; 1*3, 5, B and 10. It is now apparent that the 
Value of the median is 5. since two items are below that value 
and two items are above it. When there is an even numlser of 
cates* a* G, 13, 4, 0, 3 and 1G the determination of medic,n is 
more difficult. When these values are arrayed : 4 , 5, 6 , ft, 13 
and Ifi, the median value may Xw anywhere between fi and 8. 
Sometimes the median is assumed to be half way between 
the middle iu b and 8 in this case. We shall take the 

median at 7 •- ^ ) 

I "Median w the magnitude permitting if the item half way the *enet; ,f 

..UtHvJf'V 



MEASURE# OF CENTRAL TENDENCY 165 


The position of the median can be found by the formula 


XM , 


in which .V is the number of items in the array. In the 


example given below the median is the value of the 7th item 

Afl ! $4 1 

~2 


f vi/.. 


7th item}, which mean* that the median 


of weekly sales for three months is Rs. 10.300. If the number 
of items in the array is even, the practice is to take the mean of 
the values of the two middle items, as the median must lie bet** 
ween them. II one item is removed from the following example 
the median would be the size of 6'Suh item, i.c,, the size of 
thr 6th item f the sue of the 7th item divided by two 


IO.21.MJ-f 10,300 
o 


-.Rs. 10,250. 


TABLE 10.5 


Weekly sales for 

Weekly sales for 


three months 

three months 

Week No, 

(original data, 

(same data, 


unarranged) 

arranged) 

1 

10,200 

9,800 

2 

10,600 

9,900 

3 

9,800 

9,900 

4 

10,000 

10,000 

5 

. 10,300 

10,200 

6 

9,900 

10,200 

7 

10,200 

10,300 median 

B 

10,800 

10,400 

9 

10,600 

10,400 

10 

10,400 

10,600 

I! 

9.900 

10,600 

12 

10,900 

10,800 

13 

10,400 

10,900 





166 AN IHTAODUCriOW TO ETATMTICAL METHODS 

Thu* the *tep* necemry for finding the median in an 
ungrouped leriea are : 

(1) Arrange the data in order of magnitude. 

(2) Find out the middle item by taking 

item. 

(3) Determine the value of the middle item as explained 
above. 

Tfc* IMIaa from Grouped Data 

(a) DtscuUor Non-wntinuous Stnu. In the case of grouped 
data (both continuous and non-continuous; arrangement of the 
items according to their respective sizes has already been 
effected since classes will always be in order. Besides this, the 
essential procedure is similar to that just described. When the 
items have been grouped by values or sizes (see the example 
below), the median value or size is that of the group in which 
the median item falls. The numbers occurring are arranged 
'u a cumulative frequency column, and the median is the size 

>f th item. For example, in the array in the following 

example the median size is the size of tire * - 2305th 

item. By inspecting the cumulative totals, it j$ seen that this 
item falls in the cumulative total opposite size 8$, and this, 
therefore, is the median size, he., the median average size of 
shoes sold is 8|. 

The series in this example is non-continuous or discrete, 
because it is set up in classes differing by definite amount! 
(sues in half inches). 



MEASURES Or CENTEAU TENDENCY 


167 


Table 10.6 

Number Of Shoes Sold by Sia« in One Year 


Size 

Number of Pairs 

Cumulative Total 

5 

30 

30 

5t 

40 

70 

6 

50 

120 

6J 

150 

270 

7 

300 

570 

'i 

600 

1170 

8 

950 

2120 

a* 

820 

2940 

9 

750 

3690 

n 

440 

4130 

10 

250 

4380 

10* 

150 

4530 

11 

10 

4570 

11 * 

39 

4609 


Total 4,609 


ib) 

Continuous Senes. *\Vhcn 

the items have been 


grouped into classes with class intervals as *n the table 10.7, 
the problem of determining the median is slightly complicated, 
because the real values of the individual items are not known. 
In this case {where .V-6,291) the median value is that 

V 6291 

value on either side of which * or —— or 3145*5 items lie. 

If we start from the low er end of the scale and move through 
the successive classes we find as we cross the upper limit of the 
Bth class (62*5—67'5} t 2713 items have been passed ; and there 
arc 3500 items below the upper limit of the 9th class (67*5— 
72 5). Thus wc conclude that it is somewhere between the 
lower and upper limits of this class (9th) that the desired point 
lies (the point which has 3145*5 items on each side of it). 

•To Had the mkldle item in the case of continuous distribution 
¥ 

we use the formula ;— , where S is the total number of ulnervaueos. 




168 An introduction to statistical methods 


TABLE 10.7 

Dltftrftbvicioii of Mole Worker* by Average 
Monthly Earning* 


Group 

Monthly Earnings 

No. of 

Cumulative No. 

No* 

Rs. 

Workers 

of Workers 

1 

27*3- 32 5 

120 

120 

2 

■J'2-5 - 37-5 

152 

272 

5 

37 5. 42 f> 

170 

442 

4 

42*3. 47 5 

214 

656 

5 

47*5— 32 *3 

410 

1066 

6 

32*5. 57*5 

429 

1495 

7 

57 S. 62 5 

568 

2063 

8 

62-3- G7!i 

650 

2713 

9 

67*5-. 72*5 

795 

3508 

10 

72*5- 77 5 

915 

4423 

n 

77 5 - 82*5 

745 

5168 

12 

82*5— 87 5 

530 

5698 

13 

87*5 - 92 5 

259 

5957 

14 

92*5— 97 5 

152 

6109 

15 

97'5~ 102-3 

107 

6216 

16 

102*5 707*5 

50 

6266 

17 

107*5—112 5 

Total 

25 

6,291 

6291 


For the purpose of determining this point we start with 
the assumption that within any given class there is uniform 
distribution of items. Of the 795 items included in 9th class 
432*5 (3145*5—2713) are to he added to the cumulative fre¬ 
quency of 8th class to reach the desired point. On the assump¬ 
tion of even distribution 452*5 items will lie within a distance 
on the scale equal to 432 5 795 of the class interval. The class 
interval is 5, Therefore, 432 5/795 of 5 is equal to 272. If 
this 272 is added to the lower limit of the 9th class we get 
67*5 r 272 «*70‘2. This point on the scale, i.e,, 70*2, is the 
dividing lint on each side of which lie 3145*5 cases. This, 
then, is the value of the median 





NLfeASVSES OF CENTttAL TENDENCY 


169 


This can be put in the form of a formula : 

M edia n / d—where 

/ ■■■■ lower limit of the class in which median lies, 
i*-class interval, 

/--frequency of the class having the median, 
i ^„V 2 ( —) cumulative frequency of the next lower class, 
A 2 1 Vital number of items divided by 2 

This gives us : 

Median *67-5+ 

«67'5'+272 -702 

Location of Medina by Graphic Analysis 

The median can also be determined approximately from 
the given curve, i,e. t the cumulative frequency distribution 
plotted on a graph paper. 

The following table shows the. cumulative frequencies of 


the previous example. 

TABLE 10,8 


Monthly earnings 

frequency 

less than 

more than 

275 


0 

6291 

32-5 

120 

120 

6171 

37 :> 

152 

272 

6019 

42*5 

170 

442 

5849 

475 

214 

656 

5635 

52 5 

410 

1066 

5225 

57 5 

429 

1495 

4796 

625 

568 

2063 

4228 

67*5 

650 

2713 

3578 

72-5 

795 

3508 

2783 

77 5 

915 

4423 

1868 

825 

745 

5168 

1123 

87 5 

530 

5698 

593 

92 5 

259 

5957 

334 

97 5 

152 

6109 

182 

102 5 

107 

6216 

75 

107*5 

50 

6266 

25 

112*5 

25 

6291 

0 




170 AW INTRODUCTION TO STATISTICAL METHODS 

The median value it obtained in the following manner : 

(a) Compute N !2 and locate thii point on the vertical 
scale. (3145‘5) 

(b) Draw perpendicular to the f-axis at this point and 
produce the perpendicular to intersect the ogive curve. 

(e) From the point of intersection, drop a perpendicular 
of the .V-axis. The point'where this perpendicular cuts 
the base gives the v alue of the median. 



Fig. 10.1 

Another method is to draw two cumulative frequency 
curves, one on the Mess than basis’ and the other on the ’more 
than basis’* The median may be graphically located by 






MEASURES OF CENTRAL TENDENCY 171 

dropping a perpendicular from the point at which the two 
curvet cross each other. This is shown in fig. 10.2. 

Properties of the Median 

1 . Median is positional average and, therefore, is influ¬ 
enced by the position of the items in the array and not by the 
size of items. 

2. Median is greater than the mean when the distribution 
h skewed towards the left and is less than the mean when the 
distribution is skewed tow ards the right, They arc equal when 
the distribution is symmetrical, 

3. The sum of the absolute values of deviations is least 
when the deviations are measured from the median, i.e., 

I | At —median ! <T j XiA } 

When A is any value other than median 

Illustration : Let the variable X has seven values, 

\i<^ A,<A a .<A 7 

Then *, is median 


X 

j X— med j 

j X-A I (*«*) 


*4 ' 

*1 

X * 

X4 *1 


#3 

*4 ~X* 


*4 

X 4 X| 





x § 

*4-*i 

x* .*J 

*7 



S \ X — mrd 1 

1 =5 n 

X—A ( 

$*» *~ 

*l+*«~** + -*4 

~*4 


~ x 4 +x- l —x l 


11 

H 

** 






]72 AN INTRODUCTION TO STATItjTICAL METHODS 



H 

”*»+**-** -i 

■ H- *J +*«--*! 


+■ 


% + *i -r, 


X, 

+ Vi V'-'V 




*«“t 'ft X \ 

x l ~~ x l) T (*i "" 

*Cv 

S-\ 

when % is 

•f-vc as */>x* 


Hence £’</* 

Nature and Significance of Median 

The median can be obtained even hough the values of all 
items arc nut known. In fact, only the mid-item needs 
valuation, provided it h known that there are some number of 
items on each side of it. This is particularly useful, when it is 
difficult to measure the characteristic under observation. The 
intelligence of students in a class is difficult to measure in 
certain units but it may be easy to arrange the students in 
order of intelligence and determine the student with the 
median average intelligence. That is in say that median may 
be determined even though it is not measured mathematically. 

Although the median is not influenced by the size of 
extreme items, the effect is not likely to make it unrepresenta¬ 
tive of the data. The median may be located even when the 
data arc incomplete, the class intervals are irregular and final 
classes arc open intervals. 

The chief disadvantages of it arc that it needs the organisa* 
lion of data and that it is not likely to be representative when 
number of cases is small, because it is the positional average. 
It should also be noted (hat although theoretically there arc 
equal number of cases on each side of median, this may not be 
the case if the array contains groups of items of same size. 
This is particularly true in case of discrete or non-continuous 
series. The median size of shoes in the data in table 10.6 is 8| 
for this is the size of mid*ttcin*-23G3th, But there arc 2120 
item# smaller and 1169 items larger, and median may be taken 
m my of the 820 items of size fty. This indefinite quality of 
median particularly exists in discrete series. 



ME AS mas OF CENTRAL TENANCY 173 

Qaftitilts, ikdlii mud Ftrceotile* 

Median, as has been already explained, is the value of that 
item which is located in the centre of the array and which 
divides the array into two equal parts. We could, of course, 
find three points which divide the array into 4 parts or nine 
points winch divide the array into 10 parts. 

The three points which divide the array arranged in ascend* 
ing order of magnitude into four parts in such a way that each 
portion contains an equal number of items, are called the 
quantiles. The 1st, 2nd and 3rd point* arc termed as first, 
second and third quartiles resjtfctively. When computed, quar- 
hle 1 (Q |) will be at a point on the scale of the variable 
which divides the series so that 25 per rent of the items ate 
below it in value and 75 per cent are above it. Quartile 3 
i<X -i) is the value which is exceeded only by 25 per cent of the 
items, and exceeds 75 per cent of them. Quart!le 2 (Q g ; is the 
value which is exceeded only b\ fvd per cent of the items, and 
exceeds 50 per cent of them. This, of course, is the definit ion 
of the median--Q t and median are identical both in value nnd 
concept. 

The deciles are the nine points which so divide the ana\ 
that rach portion contains equal number of items In this 
case, ax it is clear from its name, the arras i* divided into 
h* parts. 

Symbolically. 

1 st decile ~ I>, 

2nd „ “ I); 

5th ,, () 2 Median 

9th ,, - I) t 

The percentiles are the ninety-nine points which divide 
the array into 100 parts in such a manner that the parts contain 
equal mi miter of items. 



174 AN INTRODUCTION TO STATISTICAL METHOD* 

Symbolically: 

lit percentile ~ P, 

2nd „ «P f 

50th „ ®*P^iD a ** Q,| rr Median 

Similarly, we can divide the array into 3 or 8 equal part*. 
The point* in the former rate are termed as quintile* and in 
the latter case a* nettles, 

Ueatlon of Qaartllf*, Dediet, etc. 


Just a* the median item can he found by counting 


W-M) 


(or Jf;2 in cane of continuous series) items from either end, so 
alio the quart He* can be found by counting i t r- rr> s from 


each end. 

In the case of ungrmiped statistical data or in the case of 
discrete frequency series the following formulae apply : 


Q, Si?f 

of -----. -*th item 

Q s .. 

.VJV-H) . 

•> 4. ,h " 

n, - •• 

» v • 1) 

10 

i), -* .. 

'* i(T " " 

p, - „ 

i.V+ 1) , 

100 *' 

p. .. 

9V+I) 

” tiki 1 •’ 

3r<l Quintile „ 

W th 

5th Ortile *• „ 

m jp'.»* 





or central tendency 


175 


Individual Observation. In the series given below find the 
first and third quart iles : 

X 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 


^ ^ . r (154* I) t . 

0 , 1 “ the sire of—-—th item 

■*" ,, ,» „ 4th item «4 

rv u - ,3(15-4 1), . 

O the size of —-—— th item 
4 

*» », i, »♦ 12th item®* 12 


nimrtit frequency Distribution. In the data given in Table 10*6, 

Q i — Size of A. * ib item. 

4 


{4609 4-1) 


th item. 


1152 5th itmn. 


n 


~ c . f 3LV+ 1 t . 

Q t - Size of ; th item. 

J 4 

3(46094-11 t .. 

** , f . } ■■■ th item 

4 

: it ♦. 3457 5th item. 

*■- 9 

„ c . r KXjVM . . 

I io “Size of —~jp---th Mem. 


10 40094 11 t . 

.* h —.Tt>o —~— 11 ,trm * 

46l*0th item. 


- 7 



176 AN INTRODUCTION TO STATISTICAL METHODS 


D, 


«^Siae of 


(-V+I) 
10 . 


th item, 


JOS. 


u 


(46004-1) 

!0 


th item. 


w , T 46! 'Oth stem. 


Cmtinwwt Fttqmnry Distribution. It will be recalled that 
when the data ate grouped in continuous frequency tables, we 
assume in case of calculating arithmetic mean that the total of 
the value* in any claw* is just what it would be If all the items 
were located at. the mid-point of’the class. This is the same 
thing a* assuming that the items in the group are evenly or 
uniformly distributed throughout the das* assumed in case oi 
calculating median». Mere, as in the rase of median. \vr 
assume that thr items are evenly distributed over a class 
interval, 

In grouped data, quart* deciles, etc, (like median) are 
located by interpolation. The formula used is 


where / stands for the lower limit of the gioup in which 
Cl, lies, 

i stands for width of the class interval, 
j stands for the frequency of that group, 

V 

r stands for tfir- difference between - and nmui- 

4 

I Hive frrqumcv npto the next lower group, 
Similarly, 

Cl,-/4 “ 

3 V 

where is the difference between — and the cumulative frr* 

4 

queue \ upto the next lower class 



MEASURES OF CENTRA i TENDENCY 


177 


Hence 


Exempli’■ For the data of Table 10.7 

*.,S£U 15 72-7 S 

This item lies in the 57 5—-62 5 group. 
ic 




^57’5'f 


5 < s 1572 75.1495) 


568 


-57 5 -f 

-57 5 f 


J> 77*75 
56fi 

588*75 

568 


Again, 


57*54 *68 
58*18 

IV 3(6291) 


4 718*25 


4 4 

This item lies in the 77*5—82*5 group, 

„ r . 5(4718 25 4423) 

Hr nee C> j * 7/ 5-4“ 

^ r 1476*25 _ p 
7/ 5 f —* /9*5 approximately. 


Locating Graphically the Quartilei, etc* 

Like the median, the quartile* etc,, may be located by 
means of graph with the aid of #give curve or cumulative 
frequency curve. This is clear from fig. 10.3. 

The Mode 

The mode, strictly defined, is that value of the variable, 
which occurs or repeats itself the greatest number of time*. The 
mode is the most ‘fashionable’ sire in the seme that it it the 
most common and typical, and it defined by Zferk as, 41 the 
12 




Fig. 10,2 







MEASI HES Of CENTBAl, TBNDEPiCT 


179 



Fig. 10.3 

value occurring most frequently in a retie* tor group) of item* 
and around which the other item* ate distributed most 
densely." 





180 aw iirmoDucnojf to statistical methods 


The mode of a distribution is the value at the point 
Around which the item* tend to be most heavily concentrated* 
It ta the most frequent or the most common value, provided 
that a sufficiently large number of items are available to give 
a smooth distribution* It will correspond to the value of the 
maximum point (ordinate) of a frequency distribution if it is 
an *ideaP or smooth distribution. It may be regarded as the 
most typical of a series of value*. The modal wage, for 
example, is the wage received by more individual* than any 
Other wage. The modal hat size is that which is worn by 
more persons than any other single size. 

It may be noted that the occurrence of one or a lew ex¬ 
tremely high or low values ha* no effect upon the mode. If a 
series of data are unclassified , not having been either arrayed 
or put into a frequency distribution, the mode cannot he 
readily located. 

Taking first an extremely simple example, if seven men 
are receiving daily wages of Rs. \ 6, 7, 7, 7, H and 10, it is 
clear that the modal wage is Rs, ? per day. If we have a 
series such as 2, 3, 3, 6, 7, 10 and 11, It is apparent that there 

is no mode. 

There are several methculs of estimating the value of the 
mode. But it is seldom that the different method* of ascer¬ 
taining the mode give us identical results Consequently it 
becomes necessary to decide as to which method would be 
most suitable for the purpose in hand. In order that a choice 
of the method may !>e made intelligently we should understand 
carefully each of the methods and the differences that exist 
among them. 

The following are ti e four important methods of estima¬ 
ting mode of a series 

{*) Locating the most frequently repeated value in 
the array ; 

(ii) Estimating the mode by interpolation ; 

(tii) Locating the mode by graphic method ; and 
(/V) Estimating the mode from the mean and the 
median. 



MEASURES OF CENTRAL TENDENCY 181 

(i) Locating tht Most Frtqmwtly Repeated Vain* lit the 
Array 

The data are first arrayed, and not infrequently the mode 
is at once apparent—the value coining most frequently being 
the mode of that distribution, If we look at Table 7.2 we 
will find that 40 is repeated more often (7 times) than any 
other value. Thus ‘40\ being the most frequently repeated 
value, is the mode of this series, This is evident from Tabic 
7.3 which gives the frequency distribution of marks. 

When there is no apparent mode clearly revealed by the 
frequency distribution it is necessary to re-group the figures. 
This is done by widening the classes ; a procedure which 
smooths out irregularities. This is shown in the following 
example : 


Number of Articles Sold by Sites 


Ss/e 

/ 







I 


3 

4 

5 

6 

1 

2\ 






2 

7/ 



22 



3 

4 

131 

15/ 

28 ^ 

j 

35-, 

I 

i 

v 

j 

35 

1 


5 

201 

45 ^ 

> 

l» j 

60 

> 48 

6 

25/ 

A 

\ 



-j 

* 

231 

47 1 

1 

J 

l 


B 

24/ 

44 > 

§7 

\ 

i 

► 7* 



/ 

J 

) 

J 

i 

9 

201 

,3} 


V 

67 


10 

23/ 

38^ 

J 

52 

*> 

1 

V 58 

H 

12 

it} 

"} 

J 

40 

1 

J 

55 

1 

[ 

1 

13 

*\ 

45 




y 59 

14 

19/ 



d 

f 




182 AW lWTBOfM CnON TO STATISTICAL METbODS 

Thif procedure is *s .follows t the frequencies ere grouped 
by twos to obtain column 2 ; and again by twos, starting with 
the second item, to obtain column 3. Then they are grouped 
by three items, starting with the first item, then with the 
second, and lastly with the third item so obtain columns 4, 5 
and * If necessary, groaning can be done in fours. As each 
of the groupings as completed the maximum grouped frequency 
•s underlined or indicated in .bolder figures. It will be seen 
-that the 6th Size ii included in 3 maxima of grouped frequen¬ 
cies, the 7th an 5 and 8th in 3, hence the mode is the sire of 
the 7th group, i.e. # * > and the other most popular sixes are 
6 and 8 (See Analysis.) 


Analyai. Table 


Column No. 

3 f 

Sixes 

7 ft 9 J 3 

.. .! 

2 

3 

\ 

i 

i i 

i 

4 

l \ 

i i t 

<i 

1 

i i 

No. of item* 

i.:i." 

~ 5 . 3 .r i 


(ii) Eatimatifig the Mode by Interpolation 


In the case of continuous frequency distributions the 
problem of dciermmmg the value of the mode is not so simple 
at it might have appeared from the foregoing description. 
Having located me model class of the data, the next problem 
in the case of corn*minus scries is to inter polate the value of 
ttic mode within this ‘modal class'. 

This interpolation is made by the use of any one of the 
formulae. 

(«) {»} Axi 

/ft ‘1 Jt 

(<v} AT.-4-jr^xi 




18 S 


tlEASUKES OF CEHTKAL TUNUtWCIT 


(*) 


Ala «'Ji + 


/i “/* 

2/Wo~/i 


XI 


or Afo—/,+ 


■A *“/» 


Xl 


where /» is the tower limit of the modal class, /, is the upper 
limit of the modal class,/ # equals the frequency of the class next 
below modal class in value, /, equals the frequency of the 
modal class in value,/, equals the frequency of the following 
class (class next above modal class) in value, i equals the 
interval of the modal class 


TABLE 10*9 


Wage*group 


frequency 

Exceeding but 

H as. 

not Exceeding 

18 as. 

f> 

18 as 

22 as. 

18 

22 as. 

2ft ;ts. 

19 

26 as. 

30 iis. 

12 

30 as. 

34 as. 

5 

34 as* 

38 as. 

4 

38 as. 

42 as. 

3 

42 as. 

40 as. 

2 

46 as. 

50 as. 

1 

50 as, 

54 as. 

0 

54 as. 

58 as. 

1 


In the given example (Table 10.9) the lower limit of the 
modal class is 22, its upper limit 26, its frequency 19, the fre* 
quency of the preceding class 18 ; and of the following one 12. 
The class interval is 4. Using these we have : 

(«} {«) Afo*22 + -j 

. 22+4 

* 23*6 


12 >4 

■ 8+12 


(iij A/v«* 26— 


18x4 

184 12 


26 


1 12 


*23*6 








184 AN INTRODUCTION TO STATISTICAL METHODS 


<*> "- 2S+ 2xi9 


- 22 + ‘ 

- 22 5 

The above formulae (a) (t) and (a) (it}, it will l>c noticed, 
use the frequency of the classes adjoining the modal class to 
pull the estimate, of the mode away from the mid-point tow ards 
either the upper or lower class limit. In this particular case 
the frequency of the class preceding the modal class is more 
than the frequency of the class following and, therefore, the 
estimated mode is less than the mid-value of the modal class. 
This also seems quite logical. If the frequencies are more on 
one side of the modal class than on the other, it can he reason¬ 
ably concluded that the items in the modal class are concen¬ 
trated more towards the class limit of the adjoining dass with 
the larger frequency. 

The above formula (h) is also based on a logic similar to 
that of (a) (i) and («« (ii;. In this case, to interpolate the 
value of mode within the modal class, the differences between 
the frequency of the modal class and the respective frequencies 
of the c lasses adjoining to it are used. This formula usually 
gives results better than the values obtained by formulae (a) 
and are exactly equal to the results obtained by graphic 
method. The formula (a) gives values which are different 
from the value obtained by formula (b) and arc more close to 
the central point of modal class. If the frequencies of the classes 
adjoining the modal ate equal, the mode is expected to be 
located at the mid-value of the modal class, but if the frequency 
on one of the sides is greater the mode will be pulled away 
from the central point, it will be pulled more and more if 
the difference between the frequencies of the classes adjoining 
the modal class is higher and higher. In the example given 
above the frequency of tnc modal class is 19 and that of the 
preceding class is 18. So tile mode should be quite close to 



MKASVRZ* OF CENTRAL TENDENCY 1B& 

the lower limit of the modal elm-. The mid-point of the modal 
claims is 24 and the lower limit of the modal class is 22, The 
value obtained by formula (a) is 23*6, quite close to the central 
value and that obtained by formula {b} is 22'5, a value dose to 
the lower limit, 

Sometimes the formula (&} can give absurd results. If the 
frequency of either the preceding or the following class is 
greater than the frequency of ihr modal class, it may give a 
value which does not he within the modal class. In such a 
case formula [a) may be used ii will never give a value which 
sirs outside the modal class. 

• Hi) Locating the Mode by Graphic Method 

i he method of graphic interpolation is illustrated in the 
following diagram. The upper corners of the rectangle over 
the modal class have been jomed by straight line* to those of 



Fig. 10.4 






186 AW IftTRUlSCCTION TO rrATIOTICAl* METHODS 

the Adjoining rectangle* u shown in the diagram ; the right 
cornet to the corresponding one of the adjoining rectangle on 
the left etc. If a perpendicular is drawn from the point of 
intersection of these lines, we have a value for the mode indi¬ 
cated on the base line. Tim graphic approach is in principle 
similar to the arithmetic interpolation explained above as 
formula (A). 

The mode may also be determined graphically from an 
ogive Or cumulative frequency curve. It is found by drawing 
a perpendicular to the base from that point on the curve 
where the curve is most nearly vertical* i.e., steepest (in other 
words, where it passes through the greatest distance vertically 
and smallest distance horizontally). The point where it cuts 
the base gives us the value of the mode. How accurately tins 
method determines the mode h governed by (1} the shape of 
the ogive, (2) the scale on which the curve is drawn. 

(lv) Estimating the Mode from the Mean and the 
Median 

There usually exists a relationship among the mean, median 
and mode for modrrately asymmetrical distributions. 1 If the 
distribution is symmetrical, the mean, median and mode 
will have identical values But if the distribution is skew 
(moderately), the mean, median and mode will pull apart. If 
the distribution tail* oil towards higher values, the mean and 
the median w ill he greater than the mode ; if it tails off towards 
lower values, the mode will be greater than cither of the other 
two measures. In either case the median w ill be about one- 
third as far away from the mean as the mode is. This means 
that 

Mode Mean--*3 (Mean — Median) 

~3 Median—2 Mean 

In the case of the average monthly earnings (see Table 10,2) 
the mean is 68 53 and the median is 70'2. If these values are 
substituted in the above formula, we get : 


* See Chapter 12. 




MEA&&KB8 OF CSNTRAL TKEfOBUCY 187 

Mode 68 5—3 (68 5 —70 *2) 

**68*5+5*1 
® 73’6 

According to the formula used earlier, 
f 74 S 

Mode =» l k 4- x 1 ,?sa ^'5+ ■ x 5**72 5+2*4 

^74 9 

The difference between the two estimates is due to the fact 
that the assumption of relationship between the mean, median 
and mode may not always be true. It is obviously not valid 
in this case. 

Properties of Mode 

1, Mode represents the most typical value of the distribu* 
lion and should coincide with existing item, 

2. Mode, as usually computed, is not afire led by the 
presence of extremely large or small items. 

Airfare and Significance of Mode. In our everyday conver¬ 
sation when we talk of an ‘average’ citizen it is usually the 
modal citizen that we mean. Similarly ‘average’ income means 
the modal income, he*, the income which is most common. 
The mode is the most descriptive average since it signifies the 
most typical value in the series, and indicates the precise value 
of an irnjKjrUnt part of it. 

The mode is not affected by extreme values, and for this 
reason would be a more representative average for many 
purposes. For example, if the proprietor of a shoe store 
wishes to know the average size of the shoes sold by him, he 
would use the modal average rather than the arithmetic. The, 
mode may not always be well defined and as such it may not 
be possible to locate it properly. There may be no clear cut 
point of ‘maximum density’ and, therefore, the point taken 
as mode may depend largly upon the judgment or desire 
of the individuals computing it. When the data is given in 
the form of frequency distribution it is necessary, for the pur¬ 
pose determining mode accurately, that the class interval be 
properly chosen and the class limits be correctly placed. 



18B AN INTRODUCTION TO STATISTICAL METHODS 

The mode ran he satisfactorily located from u distribution 
of unequal classes* provided the modal class and those on 
ttihcT side of it are of equal intervals. Nor do open end classes 
cause any difficulty in its location. 

h must, however, he observed that unless the number of 
items is fairly large and the distribution reveals a destrurt 
central tendency p the mode has no significance. Also unlike 
arithmetic mean it cannot be subjected to mathematical 
treatment. 


Hie Geometric Mean 

The geometric mean of n positive values is the wth root of 
their product. Thus it is obtained by multiplying together all 
the values and then extracting the relevant root of the product. 
It can be represented as; 

Geometric Mean - y'jt, v r .where n stands for 

the number of measures and x u x t .,.v* are the various 

values. For instance the geometric mean of 4, 8, 16> 

GM f4>:8x 16 
- £ 512 - 8 

The above method of calculating geometric mean is satis¬ 
factory only if there are two or three items. But if n is a large 
number the problem of computing the nth root of the product 
of these values by simple arithmetic is a tedious work* To 
facilitate the cmnputation of geometric mean we make use of 
logarithms. The above formula when reduced to its logarith¬ 
mic form w ill be ; 

!o* GM---» ,0S *• +lo * *» f ,0 J? *<>8 *. 

The logarithm of the geometric mean is equal to the 
arithmetic mean of the logarithms of individual values. T he 
actual process involves obtaining logarithm of each value, 
adding them and dividing the sum by the number of items. 






ME Ad ORES Or CENTRA!* TENDENCY 189 


i.e., a. The quotient »o obtained is then looked up in the tables 
of antMogarithms which will give us the geometric mean, 
lllusirathn : Find the GM of : 2 ; 4, 8, 12, 16* 24, 

Sohiiitn \ 


2 

4 

8 

12 

tfi 

24 


log. 

0*3010 

0-6021 

0-0031 

1*0792 

1-2041 

13002 


Geometric 


Mean antilog 


5 4f>9 7 
.6 


5 4697 


anti log *9116 
« 8158 A ns. 

If it is desired to find the geometric mean of a frequency 
distribution we adopt the following formula 

lop a M /> l°g r/« lo K log *»• • /Jog v« 


where / p /*,/*.../*, respectively represent the frequencies 

ol Aj, Xf, Xjf. . x, n and .V the total of frequencies. This is 

also called the weighted geometric mean, because the frequency 
of each class (or value) represents its relative importance in 
the series. 

Thus the steps involved in the calculation of GM are : 

(1) In the case of individual observation series find the 
logarithms of the values and add them. But in the case of a 
frequency distribution write down beside each frequency the 
logarithm of the corresponding size or clast-mark at the cate 
may He, 

(2) Multiply each logarithm by the corresponding fre* 
qtiency* 




190 AN IHTBODUCTTO't TO STATISTICAL METHODS 

(3) Add the products. 

(4) Divide the sum of these products by the total fre¬ 
quency* 

(5) This yields the logarithm of the GM, To find the 
CM, take the antilog, 

Formula : 

log GM «T/„ I r °ll. 

Illustration : 


Clast 

intervals 

M. Value 
M 

Frequency 

/ 

log of M 

/x (log A f) 

9*5—14* 5 

12 

10 

1 0792 

10 7920 

14*5—19 5 

17 

15 

1 2301 

IB 4560 

19 5 -24 5 

22 

17 

1*3424 

22*8208 

24*5-29 5 

27 

25 

14314 

35*7850 

29 5-34 5 

32 

IB 

1 5051 

27-0918 

34 5—39 5 

37 

12 

1 5682 

18*8184 

39 5—44*5 

42 

H 

[‘6232 

12*9850 


105 146 74% 


Geometric Mean antilogy—-g-jr— 

— antilog 13976 
-21*98 

Ckarscteriitics of Geometric Mean 

I- IfG is the combined GM of a number of series having 
individual CM'* si G,, G f ..O,, 

then log G T [N, lag G, » N, log G, .N, lag G,] 

where N„ N f .N, are number of items in each series 

and ,V h total number of items. 

or N log G*»N t log G*4 N* log G,...N r log G, 

The geometric, mean of the ratios of the corresponding 
observations in two series is equal to the ratios of the geometric 
means 







wmABimtM or cmmwt rntmuncr 191 

because if X^x i fX t 

then logX-IogX^iog X, 

Summing for all V s and X**» and dividing by N, the number 
of terms in each series 
G-Gj/G, 

T The Geometric mean of the product of various series 
is the product of the individual OMV 

4, In a series there will always be some measures which 
are less than its geometric mean and some which are more. 
If the ratios of the geometric mean to the measures which are 
smaller than this be multiplied together, the product so obtain¬ 
ed will be equal to that obtained by multiplying together the 
ratios to the geometric mean of measures that exceed it in 
value. This can be shown by the following example : 

The geometric mean of 3, 4, 9, 12 is f>. This means that 

f> . 6 _ 9 v 12 

Uses of Geometric Mean 

From this we can conclude that this is an important method 
of averaging ratios. Its main use is in the construction of 
index numliers of prices. A rise in prices from 100 to 1000 
should balance a fall in the prices of another commodity, 
from 100 to 10. The arithmetic mean is not able to achieve 
this, since /1000 f 10) ; 2 * 505 shows that if the changes m 
prirrs are averaged there is an increase of 405 per cent. This 
obviously is wrong since the two ratios of change in opposite 
directions are equal and the increase in one should balance 
the decrease in the other. The geometric mean is able to 
do this. 

GM^^f 1000x10 -100 

The geometric mean of a series of different measures is 
always less than its arithmetic mean and consequently it is used 
when it is desirable that undue importance shall not be given 
to large values of the variable. 



192 A n INTRODUCTION TO STATISTICAL METHODS 

The Harmonic Mean 

Like; #cometric mean. another measure r>f central tm* 
dency that i*-of itnporUnre in solving some special types of 
problem* is the harmonic mean. Like geometric mean, 
this is unfamiliar and difficult to interpret, that is why it is 
probably less used than any of the other measures of central 
tendency, 

“The HM is called for in problems about work, time and 
rate, where the amount of work is hold constant and an average 
rate is required ; or in probjrtm about total cost, number of 
persons and per capita cost, when the total cost is held constant 
and an average pet capita cost is called (or, or m problems of 
similar nature involving rates,” 

Just as the geometric mean is based on an arithmetic mean 
of logarithm* so is the harmonic mean based on arithmetic 
mean of reciprocals. We mav define it us the reciprocal of the 
arithmetic mean of the reciprocals of the given numbers. If 
the given numbers are x u v 2 , x V, then * 

HM- v 

I . J 1 I 

A j Aj \« v \ 

The HM of 2, 4, b* is 

3 % 

't o’? 

1 - I t I II ’ 

2 ' r 4 ‘ b 

Exempt if : 

A man travelled by car for -1 day*. Me covered 4HO miles 
each day Hr drove ; 

1st day 10 hours at 48 milts per hou 

2nd „ 12 „ „ 10 „ „ M 

3rd } , 1.3 32 M 

What was his average »jH*ed ? 

Sidutitin - 

Here we note that the total distance covered in each trip 
(day) k conitant and erjual to 480 mile*. 



MCAStTBES OP CENTRAL TRNOBIfCY 


193 


3 3 

Average *pe«4, * e., HM=» j y j “ jy 

48^ iO f 32 480 


38 


34 


jj- miles per hour. 


Example : 

A man travelled by car for 3 days He drove 10 hour* each 
day, He drove : 

1st day 10 hours at 45 miles per hour, 

2nd „ 10 „ #» 40 „ M „ 

3rd „ 10 ,* „ 38 >y „ „ 

What was his average speed ? 

Solution : 

Here the time of each trip is constant. 


Average speed Arithmetic Mean 


45 4 404-38 
3 


> 41 miles 
per hour. 

The students should note carefully the difference between 
the two examples. 


Selecting tlse Average 

The process of computing any one of the five averages 
discussed in this chapter is comparatively simple. But it is not 
always easy to choose one particular average which may repre¬ 
sent a statistical distribution for the purpose of the inquiry that 
we have in hand. The problem of selecting an average wilt, 
however, be considerably facilitated if we are aware of their 
individual characteristics and their advantages and disad¬ 
vantages. Below is given a summing of the characteristics, 
advantages and disadvantages of each average. 

Arithmetic Mean 

1. The value of (he arithmetic mean is determined by 
every item in the series. 

2. It is greatly affected by extreme values. 

3. The sum of the deviations about it is aero. 

4. The sum of the squares of deviations from the arithmetic 
mean is less than of those computed from any other point. 

1$ 




194 AH ITITKOn^CriON TO STATISTIC* b METHODS 

Advantages : 

1. If iff easily calculated. 

2. It has a determinate value. 

3. It is most easily understood. 

4„ It is most amenable to algebraic treatment, for example, 
if we have the averages of the heights of students for each class 
we may average for entire college, 
ftitttdvantagei : 

It may be greatly affected by extreme items and its useful* 
new as a 'summary of the whole’ may be considerably reduced. 

Median 

1. It is an average of position. 

2. It is affected more by the number of items tha; by the 
extreme values. 

3 The sum of the deviations about the median, sign* 
ignored, will be less than the sum of the deviation* taken from 
any other point 
Advantages : 

1. It is easily calculated and is not much disturbed by 
extreme values, 

2. It is more typical of the series. 

3. The median may be located even when the data are 
incomplete, e.g., when the class intervals are irregular and the 
final classes have open ends This give* a special advantage fm 
dealing with economic and social data. 

Disadvantages ; 

h The median h not so well suited to algebraic treatment 
as the arithmetic, geometric and harmonic means. 

2. It is not to generally familiar as the arithmetic mean. 

Made 

1. It is an average of position. 

2* It is not affected b> extreme values, 

3. It is the must typical value of the distribution. 

AdiwHtegn ; 

l. Since it is the most typical value it is the most dturifdiv* 



MIUSrWKS OF CENTRAL TENANCY 195 

2. It it easy to locate the approximate mode, but the 
determination of true mode requires extensive calculations. 

% Since the mode is usually an 'actual value', it indicates 
the precise value of an important part of the series, 

Di mdvant&gfj : 

1. Unless the number of items is fairly large and the 
distribution reveals a distinct central tendency, the mode hat 
no significance, 

2. It is not capable of mathematical treatment, 

1. In a small number of items the mode may not exist. 

Geometric Mean 

1. It is a calculated value and depends upon the sire of 
all the items. 

2. It gives less importance to extreme items than dors the 
arithmetic mean. 

3. For any series of items it is always smaller than the 

arithmetic mean. 

1. It exists ordinarily only for positive values of the variate, 

Atfranta$e .f : 

1. Since it is less affected by extremes it is a more typical 
average than the arithmetic mean. 

2, Since it gives equal weight to equal ratios of change, 
it is particularly well adapted when ratios of change are to Ire 
averaged. 

3 It is capable of algebraic treatment. 

Diwftmltiges : 

K Its computation is relatively difficult. 

2. It cannot be determined if there is any negative value 
in the distribution, or where one of the items has a zero value, 

3. It is not a widely known average, 

Itnrmomie Wan* 

1. It is difficult to compute and is not understandable to 
the common man. 

2. It is useful in the averaging of time rates and other 
similar phenomena. 



1% Am iHTRO0ttcno r« to miwncAt wtnom 

From the Above summary of the strength and weakness of 
the various Averages it is Apparent that each type of average 
has its own particular field of usefulness. It is only for certain 
purposes and under certain definite conditions that particular 
average is ben suited* and there is no such thing as ‘an all* 
purpose average*. In the matter of selecting a suitable average, 
therefore, it is necessary to carefully understand the nature of 
the data and the purpose which the average is intended to serve. 
If it is desired to give a complete description of a frequency dis¬ 
tribution on single average will be enough and it will lie neces¬ 
sary to determine all the three chief averages (mean, median 
and mode}. Of all the averages the arithmetic mean is the 
most popular and is commonly used in statistical work. This 
is due mainly to the suitability of its computation and the fact 
that its meaning is perfectly definite and quite familiar to the 
people in general The arithmetic mean, however, is not of 
universal usefulness It should not be applied when the pur¬ 
pose of the inquiry or the nature of the data suggests the use of 
tome other average, If, for example, it is desired to find the 
most common size, the mode would he best suited for the pur¬ 
pose ; when ratios are to be averaged it is the geometric mean 
which w ill provide m live most reliable results ; and the size of 
the average item will be given only by the median. 

HlmfhUWfi : 

The Matin obtained by classes J and B are given below. 
Find the mean, median and mode for each of these two classes. 
Comment upon the rest tits 

(Delhi University, 1951) 


Maik ( burned 

Cl.w A 

Class R 

Hi 

! 

5 

n ,e i 

10 

6 

15-20 

20 

15 

20 -25 

n 

10 

25 30 

6 

5 

30-35 

...1 

4 

35-40 

l 

*> 

* 

40-45 

() 

.2 


49 49 






Solution ; 


MEASURES OR CENTRAL TENDENCY 19? 


"isssnr 


T5CTT 


CL 

3 

O 

km 

c 

3 

5 


* 

s 

or 

c 


«C 

c 

-C 

* 


c 

3 

r 


c 

v 

X 

I 



O 

w 

Uh 

'h 

k* 

S 

w 

1*4 



5-10 

1 

1 

— 2 

_2 

' 5 

5 

-2 

— 10 

10-15 

11 

10 

~~ 1 

—10 

1 S 

b 

— 1 

— 6 

15-20 

31 

20 

0 

0 

20 

15 

0 

0 

20-25 

39 

8 

1 

8 

zb 

10 

1 

10 

23-30 

45 

0 

2 

12 

41 

5 

2 

10 

30-35 

4tt 

3 

3 

9 

45 

4 

3 

12 

35-40 

49 

1 

4 

4 

47 

2 

4 

8 

40-43 

49 

0 

5 

0 

49 

2 

5 

10 



49 


+vi 


49 . 


+ 34 


Clin A 


CU»ft B 


Mtan : 


r/v ■ 2i 


r/v 34 






. ,21 . 
17 5 + ,,, x.» 


17 5 + 49 X5 


19-64 


ao-97 


CUftft A 


Midi an : 


Median - the size of ^ th item. 


the si/e of (24'5 th item 

Med ia n 13 -f ^ x 13* 5 

iu 

-154 3 4 





198 Afi JNTKOIHJCTI0N TO STATISTICAL METHODS 
Modi: 


Mode- / i 



154 j^ H X 5 


154 2 22 

ws-17* a* 


Class B 


Aftdta/i 


Median the size of () ih item. 

- the size of 24'5>tli item. 
5 

«* 15 f ^ x 13 5 

v- 154 4 5 

*9*5 


Ahdt : 


Mode /r x i 

J»'h 


-■ 1343" 12 

iB'is 

The mean, median and modi for class B art higher than for 
class A. Thus the average level of intelligence of the students 
of das* B is higher from all joints of view. 'The mean indi¬ 
cates the average of the class as a whole ; the median indicates 
the intelligence of the average student of the class ; and the 
mode gives the level of intelligence of an ordinary student of 
the dan* 



MEASURES Or CE.NTH.U, TENDENCY 199 

I Weighted Mean 


is the name given to an average when the various value# 
of the variable have different frequencies ; or when the several 
items, of which it is desired to find an average, are given multi* 
pliers which express, more or less adequately, their relative 
importance in some connection--for example, in the compila¬ 
tion of index numbers, 1 or in finding standardised rates. 

If x lf x tt ,\ it .» H are the values of a variable and «,* u $ , 

a ; ,. u\ are their respective weights, then 


the weighted mean 


l'(ux) 

"'£w 


where u stands for weight. 

The weight of a measure is a number assigned to it in order 
to indicate its relative importance. The weights may be actual 
or arbitral v. 

Let us suppose that a building cunttactor employs three 
categories of workers—men, women and boys—and pays them 
a daily wage R$, 3, 2, 1 respectively. The arithmetic mean 


of the three wage rates is thus * ' ^ 1 — — Rs. 2 jxt worker. 


Hut this is not the average daily wage per woiker paid by the 
contractor unless he employs an equal number of men, women 
and boys. Suppose he employs 30 men, 15 women and 5 boy#, 
then the average wage per worker is ; 

for 30 men \n Rs. 3 j>cr man ^3x30»»Ks, 90 

„ 15 women Rs. 2 per woman «2x 15- ,, 30 
», 5 boys (a} Re, l per boy «»lx 5 « 5 


125 

W 


> 2*5 rupees per worker. 


A study of the above illustration shows that the arithmetic 
mean of 3, 2, and 1 can l>e found out in two ways. Under one 


1 I. Moumey ; htraduttion t* Statistic*! Calculation, 






200 AN INTRODUCTION TO STATISTICAL METHODS 

of thaw method* three value* are given equal importance 
(weight) and the mean thus obtained is 2 Thu# : 

* / f* 

1 1 3 

2 1 2 

1 1 1 


3 6 



Under the second method these value# ate given an impoi 
tancc of 6, 3 and t fur 30, 15 and 5) respectively and the mean 
thus obtained is 2*5. Thus: 

1 / f* 

“ 3 ” 30 '90 

2 15 30 

I 5 5 

50 125 



*25 


It is thus evident that the weight that wc attach to a value 
is an important factor in affecting the size of the mean. In¬ 
correct weighting will always lead to wrong result#. Thus if 
we attach equal weights to men, women and boys the AM 2 i# 
wrong for arithmetic mean x number of workers should give us 
the total wages paid by the contractor. Applying this we get : 
Rs. 2x50»Rs. 100 

dut the total wages paid are Rs. 125. Naturally, therefore, 
the average is wrong, Applying the test to the second method 
wc get : 


Rs. 2*5 x 5<WR«. i25 



MEASURES OF CENTRAL TENDENCY* 201 

Now Rs. 125 is the actual amount paid and hence 2 5 is 
the correct average* In this case the weight assigned to each 
value is determined by its frequency ; i.e., frequency of each 
value represents the relative importance of each value. It b, 
therefore, obvious that the mean of a frequency distribution 
is in fact a weighted mean, 

I Hut Italian : 

From the following data calculate the average price of lea : 


Price 

Quantity 

(MTI kgm. 

sold in kgm 

R§. Pa jsc 

kgm. 

! —00 

100 

1 -50 

150 

2—00 

400 

2—50 

200 

Uj 

j 

s 

50 


Solution : 


I*iice in 
Pai.sr 
|>cr kgm. 

kgm. 

sold 

Deviation from 
assumed 

A. (200) 

Total 

Deviation 


100 

100 

— 100 

—10000 


150 

150 

— 50 

- 7500 

W, Mean 

200 

400 

0 

—17500 

m°) 

250 

200 

i- 50 

10000 

—200—2 77 

300 

50 

+ 100 

5000 





15000 

»197 33 Paiie 


900 





In the above illustration the weight assigned to each value 
is equal to its frequency, i.c., the weights are actual. In most 
cases, however, weighted arithmetic average is secured l4 by 







202 AN INTRODUCTION TO STATISTICAL METHODS 

applying to the item* weight* determined by some evidence of 
importance other than that associated with items/' 

ExampU : 


t 

:3 

**r 

i 

t 

<s 


w. 

“C 

£ 


fc.s 

Ik 

6 t 


if 


Sr. 





0m £ r-i 


”<i) 

(2) 

(3; 

(i) 

A 

50 

200 

10,000 

11 

20 

150 


C 

10 

125 

1,250 

I) 

5 

mo 

500 

E 

15 

120 

J,8(>0 


H!0 


10.550 


Average—- 

R.5MI r 

, . , * lt>5 >. 

IW 



In the above illustration the weights assigned to various 
commodities arc not the actual amounts spent by any one 
family, but are based on the relative importance which we 
think each one of these live crommcKiitics has m the family 
budget of the das* of people to which it refers. 

Crude atiJ umuUti ->r genital and standard Led) death tatt*. 
Weighted arithmetic mean besides possessing all the merits of 
a simple arithmetic average, has art additional merit of facili¬ 
tating comparison betw ecu the death or birth rates of dift'erem 
toe a I it it*. Death or birth rates are generally expressed per 
thousand, and represent the number of persons expired or 
born in a certain locality during a certain year. 

The crude or general death rate of a population is found 
by dividing the total number of deaths in a y ear !>v the total 
pojmktkm and multiplying the result by !0U0. If A represents 







MEASURES Or CENTRAL TENDENCY 203 

the total numfcr of deaths in a year out of a population 
numbering F the crude death rale its : 

2 >1000 

I* 

That the crude rate is a weighted average can be proved in the 
following way : 

A' and P may be analysed according to the age* of the 
people. Let us suppose : 

P l t he population of age above 0 and lets than or equal to 10 
1\ „ „ 10 „ „ „ 20 

Then iy WWVr. K ^ 

Again, if 

\ x - tie at I is of people above 0 and less than or equal to 10 
Adeaths of people above 10 and less than or equal lo 20 
then A : *^‘A' t -f A*-*-.A*»».V 

A ; 

The formula for crude death rate given above ™ X 1000 
can be written a* ; 

A', < -V,., A , 4-.- V " x uxx) 


or X 1000 +. 4* * ‘WO f - 4* * »«». 

«-J L *(>* * l*»)+ f x ( : fj- X >«») 

Now .7-- x lOOOrcpieient* the death rale of the first age group, 
* 1 

M 

x 1000 the death rate of the second age group and so on. If 

*1 






20i a n iffTftooucriojf to statistical methods 

the group death rates are represented by d u </,, 4* ..</*, crude 
death rate can be represented as: 

/V^ /V| i /v* 

.. P ' 

Thus we can say that since the death rate of each age group 
has been weighted by the population of that group the crude 
death r*tte is a weighted mean. 

Since the total population and its age composition differs 
from locality to locality it follows that the crude death rates 
obtained in the aforesaid manner are unequally weighted and 
hence are not comparable. In order that they may be made 
comparablev it b necessary that the weights applied to the 
death rates of various age groups for two or more towns should 
lie equal. The age composition of the country as a whole, or 
the age composition of any one town may he regarded as the 
standard and the death rates weighted by this standard. This 
process is called the standardisation cr correction of death 
rates and the rates so obtained are called the standardised or 
corrected death rates. 

Illustration . 

Find the crude and corrected death rates of towns A and B 
from the following data : 


Town A 


Town B 


a 

» 

e 

u 

0 

& 

< 

a 

t C 

<5 

i 

£ 

m 

-C 

I 

£ 

6 

& 

Specific Rate 

Population 

JZ 

« 

V 

Q 

w. 

O 

6 

SK 

1 Specific Rate 

5—10 

"" Vow”. 

ibo 

25 

2,000 

40'. 

20 

to—40 

14,000 

150 


6,000 

60 

10 

40 and over 2,000 

50 

25 

4,000 

80 

20 


20,000 

300 


12.600 







205 


nuknmm or central tendency 


Crude Death Rate; Town A 


300x1000 

20000 


»I5 


Town B= 


180x1000 

"12066 


*15 


Suppose the ‘standard population* used as a basis of com* 
parison has the following age distribution : 

Age distribution Numbers 

t)— 10 300 

10-40 600 

40 and over 100 



Town A 



Town B 



t 

*5 


t 

.X 

tjI 

* 

A o 

c 

•o -- 

X 

I ft* 

0£ 

tm 

tc It 

o 

— 

w 

S a 

u 

X 

Is s 

5 g- 

6 & 

V 

t 

fc. 

5/3 

O *3 

o *2 
fc.S 

**C *• 

g B. 

2 o 

S/3 £L 

£ 

C/3 

o 

6*2 

JjT C 

300 

25 

. 7 5 

300 

.20. 

. 'X" 

600 

io-\ 7 

6*42 

600 

10 

6 

100 

25 

2*5 

100 

20 

2 

1000 


16*42 

1000 


14 


Standardised Rate for *4*~- 16*42 
„ „ „ a H 






206 AW INTRODUCTION TO STATISTICAL METHODS 


EXERCISES 

1, Define the various M’s of centra) tendency. What purpoart do their 
measurements serve ? 

2. Define geometric and harmonic mean and clearly captain their uses. 

1. Show the relative pcnitmo* of different aver awe* in a moderately 

symmetrical series. 

4. What do you mean by : 

(a) Quarters, 
fta; Decile*. 
it) Percentile* ? 

5, What are the qualities which ar» average must posse** > Which of the 
averages ytm know, possess most of these qua Hue* ? 

What do you mean by ‘wrights' ? Why are they assigned ? Point 
mil a few cases in which weighted average should be itted. 

7. Differentiate between crude and corrected death rate*. 

0» The following are the monthly salaries, in rupees, of the employ res in 
a branch bank, Calculate the Arithmetic Mean. 

10, 17. 20 . t)5, 100, iW, 175, ?50 and 750. 

(fl.Ciw,, fta/wrat, lOtti) 12’IJ 

9. The monthly income r;f fen fnmitin in rupees in a certain locality ace 
given helm* : 

Family A B C t) F, r G H I J 

Income HO 70 Mi 75 500 H 42 250 40 % 

Calculate the Arithmetic \M*rage bv (a) Direct Method, and 
(h) Short Cut Method 

(fiCam., Agra , rpfj) [2 2; 

10. The following give* the weight* of 3) persons in a sample inquiry. 
Witkwii taking arbitrary mean find the mean weight using T arith¬ 
metic mean, (ii) geometric mean. -Hri harmonic mean. 

Wright* in lb. M0 155 HO 145 HO MB 140 150 157 

Nc*. of Person* 5 4 fi (• 3 5 2 11 

{2 3, 141] 

M* kdght coin* were, tasted together and the number nf heads resulting was 
observed. The operation was performed 25b timet and ihc firqtiencvi 
that were obtained fur the different value* of « the number of heads 
are shown in the following t&hk. f'alrtiUte mean, median and 
cpta* tile* of the distribution of», 

* 0 12 5 I 5 6 7 0 

Freq I 0 2h 59 72 52 29 7 I 

<f»raes. t />WAi, r^) fH, % Ml] 

12. The following table give* the bask salary of person* employed in an 
office. Calculate mean bask salary tiling arithmetic mean. 

Salary in R*. fiO m (00 120 IbO *100 200 

No, of fVrauftt 5 H 12 22 10 7 fi 

I2.SJ 



MEASTHES or CENTRA!* TENDENCY 207 


13. The faUforMtg table give* the number erf pemmi will) different income 
in the U.SvA. during the year 1929 : 


I neotn f in *000 dollar* 

Under t 

!~~2 
2-3 
3~5 
5 — 10 
10—25 
25—30 

50-.100 

im .looo 


No. of person* in lakhs 

' 13 

90 
81 
117 

m 

0 

27 

■> 

2 


Calculate the average income per brad Also calculate the range 
vdihin which the income of middle 50*;' of (tcjjoni lie. 

[2.271 

M Foil .wins; w the frequency divtribtifion o f a certain variate. Calculate 
all the mea.ruren of central tendency. 


V'au.M"' V.ilue 1 

H--1 
l - H 
:i 5 

5 - Hi 

10.15 

t<<\ 

8 

H 

10 

12 

»a 

Variate Value 

15-25 

25.28 

28—30 

30—43 

4 V -60 

Freq. 

n 

10 

9 

a 

6 

1 The following table shows the sire of certain units sampled in various 

' nftl held* m India, in which zur.j is the average size of the coal* 
miumg unit* large*! ■* 

No- of Tcrvin* 

Jha»ia 

Kaniyan; 

C. T. 

Krtipkn rd N» 

»- of l ><iii 

No. of Units 

No. of Units 

below 50 

?, 

1 

l 

50- 100 

12 

4 

1 

100200 

»fi 

10 

It 

2W M 

16 

12 

3 

300- too 

H 

« 

5 

100-500 


if 

7 

500 11810 

19 

n 

r. 

f*>CH> arid ah»v e 

n 

9 

7 


■ fiomb0jr, <947) P^I 

10 Calculate the amhmriir mean of ihe following distribution : 

TmOl prr Shop (MO 10.20 20-30 30-10 40-50 50-60 

No. of Shops !2 18 27 20 17 0 

Also hud graphic allv tht value o f median. 

BCm^Brnb*?, n^A 12.6, 2i>4] 

{ 7. t Hu am the mean, median and mode of the following series ; 

Marks 10-25 25-40 40-55 55-70 70415 85-100 

Frwv f» 20 44 26 3 I 

(A/./U, Em Agio, tw?) \2.% 2 . 20 , 2 . 3 l J 









208 AN INTRODUCTION TO STATISTICAL METHODS 


18. From the following table calculate mean And median. By graph 
verify tit* median. 

Crop mUun txpmmtntal data on plot yit!Ji of what 


Yield* In lb. 

No of Riots 

Yield* in lb. 

No. of Hot* 

Over 0 

216 

Over 300 

31 

60 

210 

„ 3*30 

13 

„ 120 

15 G 

420 

7 

180 

m 

„ 400 

2 

240 

S7 

Cpto 5|0 

210 


■ fi.Cvm., Sajgw. t v5#) [1.10,22,65] 

19. Find arithmetic mean, median, mode from (hr following , 

Mark* fcwkm 10 10 30 40 50 00 70 80 

No, of .Student* 15 35 W M % 127 198 250 

/* Cam . ft at « 15 ^ 5 ?» (2.13,22) 

20 . The wage* of 1 000 employee# range from 4r M, to 19*. *V. They 

are grouped in 1,5 classes with a common rlnu interval of 1*. Class 

ffemiencir* horn lower* to the highest are b, i7, 35, 48, t»5, 90, 5 31, 

173, 155, 177. 75, 52/2!, 9, f.. 

Tabulate ihe data and calculate ihe mean wage. 

; tltf?) [2.12] 

21. Find the arithmetic average, median and the <|uaitiles fmm the 
Mlnwirti? distribution of 100 pertmti by s^r 

Age U*t Birthday : 15-19 20*24 25-29 3M4 35-39 40-44 

Number* : 4 20 38 24 10 4 

Aitakabnd, m’st (2,13,22j 

Also determine the mode of the dittributior: 

22. From the table given tain tv, find the mean and the median 


Marks 

No nf 
Candidates 

Marks 

No, of 
Candidate* 

l -7 

7 

21-25' " 

24 

6 — HI 

10 

2 b> 30 

18 

11 13 

1*3 

31—35 

10 

lf>« 2 « 

32 

:ii; —40 

5 



41 45 

1 


(BXVfl., Agra, [2.14,21,33] 
A ho determine the mode nf the distribution. 

23. From the following data of calculations of arithmetic mean find the 
miswng item, 

Houie Rent m H*. : 1 10 112 113 117 - 123 120 130 

No. of Homes 25 17 13 15 14 ti b 2 

Mr,in Rent : 113.81* 12.55] 

24. In a ef*tain published data, about rmmtirr of tablm given to cure 
fever. n»dv following were readable : 


No,, nf tablet* 4 8 

12 

16 20 

24 

20 

32 % 40 

NV of pei tKMIf 

Cured H U 

Hi 

11 

9 

17 

fr 4 







209 


wuaammm or anrmAL tewdeuct 

It wm «ko Hated it I different jdace Out average number «f 
tablets given to cure fever wn» IM 2 . wnm is tk massing N^t^j 

25. (a) Average rainfall of a certain city from Monday to Saturday h •$*, 
Due to heavy niafitl on Sunday the average of the week increases 
to *5*. Whmt was the rainfall on Sunday P 

(b) A student found the mean of 30 item* at 38*6. He took 30 instead 
of 40 for one item. What b the correct mean ? (2,5ft) 

2ft. Determine the mode and the median from (He following figure* : 

25. 15. 23. 40. 27. 25, 25, 25 and 20. 

(AX**., Agrit # 954 ) (2.15*28) 

27. Obtain the mean, the median and the mode of the following 44 figure* 
representing the length in half minute unit* of interval* Itefween the 
arrival of 45 successive calls at a certain Telephone Exchange. The 
figures are : 

ft, 1, IS. 19, 15, 9, 2, 2. I, 28, 7, 4ft, 16, 3, 15, 

2, 2. 2. fl. I, I, 2, 12, 1 , 1C, 7, ft, 4, 26, 15, 

3, 3, 5. 9, B, II, 40, IS, I, I. 13, 5, M, 16. 

(MA> t Afrm) (2.16,29) 

28. According to the census of 1940, the following are the population 

figures in '000 of the first 36 cities in India ; 

2488, 591, 437, 20, 213, 143, 1490, 407, 284, 176, 169. Iftl, 777, 

337, 302, 213, 204, 151, 733, 391, 263, 176, 170, 142, 522, 360, 

260, 193, 131, 92, 672, 258, 239, 160, 147, 151. 

Find the median and quart iJc* (2.17) 

29. Find the average height of a clerk in a certain office from the follow* 
ing figure*. What is the median height ? How does it differ from 
the mode ? 


Height 

Frequency 

Height 

Frequency 

5—6* 

1 

5—II* 

1 

5 7' 

2 

ft - O' 

2 

5~r 

4 

<v-~r 

1 

5 9' 

3 

6 2 ' 

1 

5.10* 

2 

6-3' 

* 


f Af A . dgm) (2.191 

30. Calculate median, quartile*. 6 th decile and 70th percentile from the 
following data : 

Marks less than 80 70 60 50 40 30 20 10 

No of students 100 90 80 60 32 20 13 5 

(ft.Cam., Rij. t t*wt) (2.24 1 

31. Calculate median, mode, quartile*, 7th decile and 70th percentile from 
the following data : 


"ariate Value 

Freq. 

Variate Value 

Freq 

7—10 99 

5 

31 —34'99 

12 

It—14*90 

9 

35—3899 

7 

IS-~tfr99 

13 

39-42 99 

5 

19-22'99 

21 

43-46*99 

5 

23—26*99 

17 

47—50 99 

% 

27-30 99 

15 

51-54 99 

1 


( 2 > 2 «M 


14 







210 ah nrmoDDcnort to statistical methods 

It EM the mode of the following series; 

Mac: 4. X 6. 7, ft. ft. 10. H. 12. 13, I ft# 15, 16, 17, 18. 1ft. 

fit*, i 40, 48, 52, 56. 60, 63, 57, 55, 50, 52, 41, 57, 63, 52, 48, 40, 

{B.Cm., Bmmm, 1 & 43 ) f2.30j 
S3, Compute the mode of (He following distribution : 

Site of Item*: 0- 5- 10- 15- 20- *3- 30- 33- 40- 45 

Frequency: 20 24 32 2ft 20 16 34 10-8 

(ft.Cen., DM, 1556 } {2,32) 

34 Keen*! the following cumulative table into the form of on ordinary 
frequency distribution and determine the valor of mode by using the 
fallowing formula t 


Mean—Modew3 {Mean—Median). 


No of Days 

No . of 

No. of Days 

No. of 


Absent 

Students 

A beam 

Students 


Lass than 5 

29 

leu than 30 

644 


• * *0 

224 

m „ 35 

650 


•« ■ ,1 15 

465 

„ » 40 

653 


.. 20 

562 

,* M 45 

655 


M *i 25 

634 





(Brnmrrt j, ipgft'; (2.33] 

35. From the data given below find the mode ; 


Ages 23*25 25*30 30*35 35*40 40-43 43-50 50*55 55-60 

Noof Persons 50 70 80 180 ISO 120 70 50 

12-361 

36. Following gives the distribution of (He live of certain farms situated 
at random from a district. Calculate lire mode of the distribution. 
Cenital tree of the 

farm in acres 10 20 30 40 50 60 70 

No, of farms 7 12 17 2 ft 31 3 3 

12.37) 

37. Calculate the median, 3rd derile and 20th percentile from the 
following data ? 

Central Sis* 2 5 7 5 12 3 195 22*5 

Frequency 7 18 25 30 20 

(■2-253 

3ft. Monthly incomes of the families are given below in rupees ; 

2000, 35, 408. 15, 40, 1500, 300, 6 , ftt), 250, 20. 12. 450, 10, 150, 8 , 25. 
30,1208, 

Calculate the geometric mean and the harmonic mean of the above 
•cries. \BCm. % Att*k*k<i} 12.40) 

3ft. The following table give* the time taken to solve the number rtf prob¬ 
lems in minutes, find the mean time taken by calculating ii) arith¬ 
metic mean, (HI geometric mean, t iii) harmonic mean. 





MEASURES OF CENTRA!, TENDENCY 211 


Time in mtnoira 

No. of 
problem* 

Time in minutes 

No. of 
problems 

0-1 

.~. 3 . 

. i—6. 

ST"'* 

1-2 

28 

6—7 

14 

.2—3 

50 

7-8 

13 

3—4 

35 

ft-9 

4 

4-5 

35 

9-10 

5 


12.421 

Attempt tbit question first without assuming any arbitrary mean 
and thru *c*lve it by assuming a suitable arbitrary mean, 

4*0. u' Using the fnllmviog figures show that arithmetic mean is more 
affected by large* value* a* compared to geometric mean, while 
geometric mean is more affected by smaller valors at compared 
to arithmetic me an. 
fi, 4, 36, 150, 170. 

h) Uttder what condition* 0} Arithmetic mean, fit? Geometric mean, 
and (tii. Harmonic mean are equal ? Illustrate by an example, 

jr| Under what conditions fi} Mean, (ii ■ Mode, and fill; Median will 
be equal 'i Illustrate by an example, (2.43] 

41. Calculate the simple average and the weighted average of the follow¬ 
ing item* : 

Item* 68 85 JO) 102 100 HO 112 113 124 )2H 143 140 !4I 153 172 
Wright* l 45 31 I II 7 2.4 17 9 !4 2 4 6 5 2 
Account for the difference in the two averages, 

< AM., AVahxihaJ, *w>) (2.15} 

42. K mm the following data relating to paper consumed bv a pies*, find 
(he dilference between the weighted average cost of paper for two 
years. 


Description of Paper IRatr (h>. Rs per Iba. Quaniiti Consumed in »<>n* 
1 1942 | 1943 ; 1942 ! 1943 


While 

. l'ir- 

tth m 

irr 

it 

IJi;; 

Brown 

0 

fii 6 i 0 : 7 

6 ? 

f> 

BV, 

Other* 

1 0 

\i 0 S 0 15 

o ! 

M 

10 


! ftanariit, i$y>) \'2A6' 

43. Tlie following table gives the population and death rate* at various ages 
of males in England and Wales CahuUtc an rstimair of death rate 
which can be considered as the rrp<rvr.tjwivr of the whole population. 


Age group 

Population 
in m 

Death rate 
per 1000 

0-5 

1855 

2l'9 

5—15 

3410 

19 

13-25 

MW) 

2‘7 

25-35 

2m* 

3*2 

35—45 

mz 

55 

45-55 

13% 

12*7 

35-65 

<408 

27*9 

65-75 

478 

640 

75 and above 

183 

163*4 








212 aw mraoDucTiow to statioticai* methods 


44, Wb»t H a weigbftd Average ? 

Two team* A *nd B participated in • football league tournament 
in « mtain year. Be low are given their goal average* : 



-iw" 

6uttide 


■■■uznjpiMi 


h 4 
11 fa 

S’ S h 

1 ; U fa 

s 

< 

w 

4? 
2 £ 
n 

it 

& 

t 

> 

< 

. A ' 

■ T: . W~ 

“TJSri 28'.is 

II i 

43 

nsr 

TRT“ 

"'■"ft.’ 

\n u 

355 i 43 54 


hi 

u» 

I S3 


B ha* ft better average an compared to A in borne matches a« well at 
in nufaidr maiehta. Hut the combined goal average of A it higher 
than that of B. Account for the apparent contradict ton*. 

%, * Mi $'*n 

45k, The following table give* lh* multi of certain examination of three 
tmivemtie* in the year 19%, Which i# the lie.it Cnis-endy f flive 
reason* for your answer. 

t, , . ■ Percentage Rrtutts in the I’nivenitv 

t ’hk ratty Evftrmnftlrtm ^ ^ ^ 


M A 

HO 

75 

70 

M.Sr. 

?0 

70 

Ml 

IV \ 

i‘A 

mi 

70 

IV Sc. 

00 

7o 

00 

IU -'out. 

75 

ft-5 

75 


Af.A., Calcutta, (2 47) 

4 t> The f dfmvmg a»e the death rate* per thousand per annum of two 

tnsvm in a certain v'Jf. 

a> I or each agr group the death rate of down A is greater than lh»C 
4 fiiwn \\ but the reverie it the raw when all age grcmpi are 
grouped together Why \% it *o ’* 

■ V: Calculate the starvlatditfd death rate for Town B faking the 
population of d own A a» the standard 


Town A Town 11 







BiMEfft tM 

MUklTtlM 














IKT 1 II 






















>8,Cm, //#•**.„ .AmM?*, *944$ P-MI] 













measures or central tendency 


213 


57 * 1« camti udkiiii of Cast of Living lode* Number of * certain place thr 

following group index numbers were found. Calculate Coir of living 
Index Number by calculating (i weighted arithmetic mean, (ii; weight¬ 
ed geometric mean, and (til/ weighted harmonic memo of group Index 
Numbers 


Group 

Index No. 


Weights 

Food 

~ " 352' ' 


48 

Furl & Lighting 

220 


10 

Clothing 

230 


8 

House Rent 

190 


12 

Miscellaneous 

190 


13 


I2.30J 

♦8- « A train travel* fust 300 miles at an average tate of 30 mite* per 

bout and further travel* the urnir distamf at an average ratr of 
40 mile* per hour- W hat t« it* average speed over the whole 
distance ? 

(b) Average rainfall of a certain city from Monday loSaiurdav i» ’35". 
Dor to heavy rainfall on Sunday the average of die work increase* 
to 58'. What wa» the rainfall on Sunday ? 

< ) A student found the mean of :>0 items a* 88'G. He took 50 and 
90 instead of 40 and 100 for two items, What is the correct mean ? 
d The inaria in the price, of commodity A was 20%. Then the 
price decreased 25*„ and again increased 15%. I* it cnriect to 
say that the resultant increase in the price was 10% 1 

(e; The rare of'increase in the number of cow* in India is greater 
than that of the population. Is if correct to say that the people 
of India **c m.m getting more milk per head ? !2.6l| 

49. Do you agree with the following ? 

\t‘ Kate for a certain commodity in the fust week it 4 sees* fora 
rupee and in the second week it B seers few a rupee, bo the 
4f« ... , 

average price u j seer* for a rupee. 


V0. 


St. 


(ii) Usually the attendance of lb Com I year class in a college is 
40 students per day. Therefore the local attendance for 100 
working days is 4000. 

(hi) Au ordinaly )iet son consumes 3 tolas of salt per week. So 32 
croret of persons living in India will consume 24 ciore seers of 
salt in 5 months (l month >4 weeks?. {2.61 j 

The following marks bavc been obtained in three papers of statistics in 
an exavtutaatkm by 12 students. In which paper is the general level 
of the knowledge of the students highest ? Give reason*. 

A 36, 56, 41. 46, 54, 53. 55 , 51, 52, 44 . 37, 59. 

H 58, 54. 21, 51, 59, 46, 65, 31, fit, 41, 70, 36 

C 65, 55, 26, 40, 30, 74. 45, 29, 85. 32, 00. M 

IMA, Pmjtk) (2-31) 

The toikmiog are the monthly salaries in rupees of 30 employees of 
a firm : 


139, 126, 114, 100, m t 62, 77, 99, 103. 106, 

M8, IS4, 63, 69, 146, 132, 118, 142, file, 123, 

ao, 85, m> 123. 133 


129, 

104, 




214 AN INTRODUCTION TO STATISTICAL METHODS 

rh* firm gave bonuses of Ks- 10, 15, 20 t 25, 30 and 35 for todivi* 
dual* in the respective laUry group* : exceeding 00 but not exceeding 
75 t exceeding 75 b m not exceeding W, and «o on up to exceeding 13® 
and not exceeding 150 

Find the average Worn paid 

ft Com t Hanarts, tttff)) ( 2.52J 
52* Find out the modal group from the following data Calculate mode 
Uy *11 the method* known. Account for the difference if any* 


Expense per week 

No. of 


farmbe* 

3.8 as- 

'" 4 ^ 

R* m.H as 

18 

R* 15.8 «* 

U 

Rs 20.« as. 

14 

Rv 25 8 as 

7 


Expense per week No. of 

{amities 

.Ri'f'mfi ai'r..77 

K* 35 B av 1 

Rv 40 H as. K 

R*. 45.8 as. 5 

R» 50.8 as.. \ 


Verify the utuc of mode graphically 





Chapter II 
Measures of Dispersion 


T he fundamental aim of a statistical inquiry is to demons* 
trate, as precisely as possible, significant characteristics of 
the data. We compute average (mean, mode and median), in 
order to characterise a series or a group. But these measures 
have their limitations and may conceal much pertinent factual 
information It is also possible that these measures of central 
tendency may give results which arc quite misleading. 

Look at the Table 11.1 in which are given four ditlribu- 
tions with the same mean and the same number of cases, but 
< i ifFerent variahility. 


TABLE 11,1 


Weekly Esroiagi of Labourers in font Workshops 
of ike Same Type 


Weekly Earnings 
Rs. 

No. of workers 

Work¬ 
shop A 

Work¬ 
shop B 

Work¬ 
shop C 

Work¬ 
shop 0 

15—16 



2 

... 

17—18 

, « ... 

2 

4 


19—20 


4 

4 

4 

21—22 

10 

10 

10 

14 

23-24 

22 

14 

16 

16 

25—26 

20 

IS 

14 

16 

27—28 

14 

16 

12 

12 

29—30 

14 

10 

6 

12 

31—32 


0 

6 

4 

33—34 

.. * 

,,. 

2 

2 

35-36 

, .. , 


. ... 


37—38 

-«• 

— 

4 


Total 


io 

80 

80 

Mean 

25 3 

25-5 

255 

»» 






216 AN INTBODtJCTlON TO STATISTICAL SfETBOM 

Since all the four distributions have the tame central ten* 
deucy one may be led to believe that the workers of the diffe* 
rent workshops are almost similar as regards the wages received* 
But the distribution of workers in various wage-groups differs 
widely from workshop to workshop. Thus a measure of central 
tendency alone is not enough to give a correct picture of a 
particular distribution and for this purpose we need some 
additional information. We must know : 

(i) the extent to which the items in a particular distribu¬ 
tion are scattered around this central tendency! (Hsfitmm »), 

(ti) the direction of the scaucredncn—whether more 
items are attracted towards higher or lower values, (tew), 
(tii) the extent to which the distribution is more peaked or 
more flat-topped than the normal distribution! (burtons). 

Diifvnim, Two series may have the same mean, mode and 
median! and identical frequencies in the modal class ; and yet 
may differ widely in the scatter of their values about the 
measures of central tendency. (Sec fig. 11.1 A.) The object 
of measuring this scatter or dispersion is to obtain a single 
summary figure which adequately exhibits the extent of the 
scatter of the variates, Le. f whether the distribution it compact 
or spread out. 

Skeumu . Statistical distributions having the same total 
frequencies and identical means may differ in another respect 
also. In some, the distribution of items on both sides of the 
mean may be evenly spread, whereas in others the spread may 
be more on the side (higher or lower) of the mean. (Sec 
fig. 11.1 B ] If the items are evenly distributed on both sides 
of the mean they are said to have a symmetrical distribution, 
if* on the other hand, there are more items on one side of the 
mean than on the other, then distribution is asymmetrical or 
skewed. In the latter case we are as much interested in the 
extent of skewness (or asymmetry) as in the measures of 
centra) tendency and dispersion. 

hums u Two symmetrical distributions having identical 
means may differ as regards the peak of their curves. One 
may be sharp peaked and the other may be fiat-topped as in 



Fig. 1M 


fig, 11.1 C. For the correct understanding of * distribution 
the measure of its peak (kuxtosis) is also very essentia!. 

Diipertloa 

tkfimthn. Dispersion may be defined as <e the extent to 





218 aw tmmomction to statistic a l methods 

which the magnitudes or qualities of the items differ, that is, 
the degree of diversity." \ReigUmm) 

A measure of dispersion is designed to demonstrate the 
extent to which the individual measures differ on an average 
from the mean or from any other positional average. 

It must he noted that in measuring dispersion, the statisti¬ 
cian is interested in the amount or the degree of the scattered¬ 
ness, and not in its direction As an example, a measure of 5 
pounds above the mean has just as much variability as a 
measure of 5 pounds below the mean, A distinction, however, 
must be made lie tween absolute dispersion and relative d is - 
pert inn. 

If the object is to describe a single frequency distribution, 
then absolute dispersion may be computed. But if the object 
is to compare two or more different distributions, lelative 
dispersion will become necessary. The amount of dispersion 
or absolute dispersion will be expressed in concrete units (i.e., 
m umts of the problem, e.g,, rupees, seers, etc.) whereas the 
relative dispersion w ill be a pure number expressed as a ratio 
or percentage. 

Thus relative dispersion is the quotient obtained by 
dividing the absolute dispersion by a quantity used as a 
standard, 

Dispersion (alio called fluctuation, spread, scatter or varia¬ 
tion) may lie measured by any one of the following ; 

L Range, 

2, Semi-Interquartile Range, 

3, Mean Deviation, 

4, Standard Deviation, and 

3, Lorenz Curve, 

The first and second measures make use of the limits, the 
third and fourth are based on deviations and the fifth may be 
termed as a graphic measure. 

Rang* 

The crudest measure of dispersion is the range of the di »- 




MEASimBS OF DISPEHMON 219 

tribal ion, The range of any series is the difference between 
the highest and the lowest values in the series. If the marks 
received in an examination taken by 248 students are arranged 
in ascending order, then the range will be equal to the 
difference between the highest and the lowest marks. 

In a frequency distribution, the range is taken to be the 
difference between the lower limit of the class at the lower 
extreme of the distribution and the upper limit of the dais at 
the upper extreme. 

If we go back to the figures of weekly earnings of workers 
in four workshops (Table 111-, we note the following : 

Workshop Range 

A 9 

11 15 

C 23 

D 1 :> 

From the above figures it is clear that the greater the range, 
the greater is the variation of the values in the group. 

The range is a measure of absolute dispersion, and as 
such cannot be usefully employed for comparing the variability 
of two distributions expressed in different units. The amount 
of dispersion measured, say, in pounds, is not comparable 
with dispersion measured in inches, So the need of measuring 
relative dispersion arises. An absolute measure can be con¬ 
verted into relative measure if we divide it by some other 
value regarded as standard for the purpose. We may use the 
mean of the distribution or any other positional average as 
the standard. 

For table H.l the relative dispersion would be ; 


Workshop A ^ 




220 AN INTRODUCTION TO STATISTICAL METHODS 
Workshop C« ~ 


fv 

** D “lFr 

An alternate method of converting an absolute variation into 
a relative one would be to use the total of the extremes as the 
Standard. This will be equal to dividing the difference of the 
extreme items by the total of the extreme items. 

Thus : 

« * . .. , Differ ence of extreme items, i,cRang e 

* Sum of extreme items 

The relative dispersion of the series is called the coefficient 
or ratio of dispersion. In our example of weekly earnings of 
workers taken above, the coefficients would be: 


Workshop A 


9 9 

21 -f 30 51 


15 J5 

174 32 49 


i* 


C 


23 23 

ITfjSf ~ 53 


»» 


x> 


15 

194 34 * 


15 

53 


Range is a* Unsatisfactory Measure of Dispersion. Though 
simple and easily computable, range is one of the poorest 
methods for measuring variability. First, because it is based 
upon two most extreme eases in the entire distribution. If 
either the most intelligent, or the least intelligent student in 
the group either of the extreme cases) happens to drop 
out, the marks range may be considerably changed, while 
the removal of any number of other students would not affect 
it at ail 



mnAsnmm of dispersion 221 

TABLE 11*2 

Difttrifeuttoa having the Same Number of Caeca, 

Imt Different Variability 


Glass 

1 No. of Students 

Section 

A 

Section 

B 

Section 

C 

0-~ 10 




10— 20 

1 


... 

20— 30 

12 

12 

19 

30— 40 

17 

20 

18 

40 — 50 

29 

35 

16 

50— f>0 

18 

25 

IB 

60— 70 

16 

10 

18 

70— 80 

8 

8 

21 

80— 90 

1 

... 


90— 100 




Total 

no 

no 

no 

Range 

80 

60 

60 


The above table is designed to illustrate three distributions 
with the same number of cases but different variability. The 
removal of two extreme students from section A would make 
its range equal to that of B or C. The greater range of A 
is not a description of the entire group of 110 students but 
of the two most extreme students only. Further, though 
section B and C have the same range, the students in section 
B duster more closely around the central tendency of the 
group than they do in section G. Thus, the range fails to 
reveal the greater homogeneity of B or the greater dispersion 
of G. Due to this defect, it is seldom used as a measure of 
dispersion. But in certain circumstances range is more 
meaningful than any other measure. In situations where the 
‘‘extremes involve some hazard For which preparation should 






222 AW !WT*0W*CTt0W TO STATISTICAL METHODS 

be made, it may be more important to know the moat extreme 
WUMW to be encountered than to know anything else about a 
dUtribuftion. An explorer would want to know the lowest 
and the highest temperatures on record in the region he t* 
about to enter. M| 

Semi-lnterqaartUe Range 

Another measure of dispersion* much better than the 
range, is the seroi*»nterquartilc range, usually termed as 
‘Qjuattile Deviation'.* As stated earlier, quartiles are the 
point* which divide the array in four equal parts. More 
precisely, Q,, gives the value of the item Jth the way up the 
distribution, and Q a the value of the item Jth the way up the 
distribution Between and Q # are included half the total 
number of items. The difference between and Q s includes 
only the central items but excludes the extremes. Since under 
most circumstances, the central half of the series tends to be 
fairly typical of all the items, the interquartile range (Q,, ~Q,) 
affords a convenient and often a good indicator of the abso¬ 
lute variability. The larger the interquartile range, the larger 
the variability. 

Usually, one-half of the difference between Q 3 and Q, is 
used and to it is given the name of Quartile Deviation, or semi* 
interquartile range. The interquartile range it divided by 
two for the reason that half of the interquartile range will* in 
a normal distribution, be equal to the difference between the 
median and any quartile. This means that 50 per cent items 
of a normal distr ibution will lie within the interval defined by 
tile median phn and minus the semi-interquartile range. 
Symbolically : 


* Walker. 

* Tlii» term wU invented by Gabon, 









MRAStmfeft or DISFERSIOIf 


its 


An shown in the above table Q.D. of workshop A is Rt. 2*1 
(or Rs, 212) and Median Value is 25*3. This means that if 
the distribution is symmetrical the number of workers, whose 
wages vary between (25*3-2 1) Rs. 23*2 and (25*54 21) 
Rs. 27*4, shall be just half of the total cases, The other half of 
the worker* will be more than Rs, 2' 1 removed from the median 
wage. In this distribution the distance between Q f and 
the median (Q, t ) is not the same as between CL* and the 
median, hence the interval defined by median plus and minus 
icmi-interquartile range will not be exactly the same as given 
by the value of the two quartiles. Under such conditions the 
range between Rs. 23 2 and Rs. 27 4 will not include precisely 
50 per cent of the workers. 

It car* thus be said that quart ile deviation gives us an 
indication about the symmetry of a distribution, and the inter¬ 
quartile range provides us with the width of the interval in 
which 50 per cent of the items should be. 

Greater care is needed in comparing absolute measures of 
dispersion, It must be recognised that the Q.D. of some 
series—weekly wages, for instance—would be a smaller value 
than the Q,.D. of a distribution of annual wages expressed in 
different units. 

If the distributions to be compared are in different units, 
it it advisable to convert the absolute measures of dispersion. 

As pointed out earlier, we may use any measure of central 
tendency as a standard for converting absolute dispersion into 
relative dispersion. In this case median is preferable, con¬ 
venience being the main recommendation of this method. 
Applied to our example of weekly earnings the relative disper¬ 
sion w ould be : 


Relative Dispersion 


Q.D 

Med 


A 

2 12 
25 3 

084 


B 

2*46 

25 6 " 

*096 


C 

2 R3 
2507 

*113 


n 

2 71 

B13 

107 


»s 





226 an iNtmaoocrioN to statistical methods 

Another standard, which is commonly used, is the tverag 
of die two quartilef* Symbolically : 

Relative Dispersion or Coefficient of Dispersion 

Qs-Q.i 

2 Q,-Q, 

I 

2 


Applying the formula to our example, 


A 

- - ....... .. .. ...l 

B | C ! D 

Coefficient 

_ *f,__27 64-23 4 

28-23*07 ; 28 17- 22-5 j 28*17 —22 75 

variation-.^ m+234 

.§.17.5k ■* 083 1 

Qi+O. _ 

28+23 : 07T28 l7+22 5 ; 28 17 + 22-75 

*®» '097 < «»* 112 —106 

1 \ 

Mom Deviation 


A weakness of the measures of dispersion discussed above, 
based upon the range or a portion thereof, is that the precise 
size ol most of the variates has no effect on the result. As an 
illustration, the quartile deviation will be the same whether the 
variates between Q| and Q, are concentrated just above 
or they ere spread uniformly from Q, to Q,,, This is an im¬ 
portant defect front the view point of measuring the divergence 
of the distribution from its typical value. The mean deviation 
it employed to answer the objection. 

Mean deviation, also called average deviation, of a fre¬ 
quency distribution is the mean of the absolute values of the 
deviations from some measure of central tendency. In other 
words, mean deviation is the arithmetic average of the variations 
(deviations} of the individual items of the series from a measure 
of their central tendency. This measure of dispersion makes 
u»e of all the observations or items of the data. Let our 
distribution be as follows : 

SO, S5, I03, 120, 80, 97, 107, MX), 112 and SG. 






MS4S0BS6 OF D18F£MIOIf 227 

We have 10 values in the distribution and the method of 
calculating dispersion it as follows : 

The mean of these items is 100, Now the first item differs 
from the mean by 1Q> the second item differs by* 5, the third 
by 3 and so on. 

If wc average ail the deviations we get : 

104 5+34-20+20+3+7+0+12 + 4 84 

-...- w -- 1¥ «8'4 

This is the average amount by which the items differ from 
the mean, and is called the mean deviation or average vari* 

at ion. 

Now in the above method, we have ignored the fact that 
some of the deviations were posilive and some negative. In 
fact we should have written the deviation of the first item as 
* 10, of the second - 5, of the rhird+3, of the fourth f 20, and 
so on. It we do not ignore the signs and add algebraically, we 
shall find that the sum of the deviation? will be zero ; thus 
adding - 10, 5,+ 3,4-20,-20,-3,4 7, 0,+12,-4, the total 
is zero. YVe have explained in Chapter 10 that an important 
characteristic of the arithmetic mean of any group of values 
is titat the algebraic sum of the deviations from it is zero, That 
h why it is necessary in computing the mean deviation to 
ignore altogether the positive ( + # and negative (— ) signs of 
deviations. This, however, is an unhealthy sign* because 
4 once the signs of the deviations have been thrown away* the' 
algebraic nature of any measure based upon them is lost*. But 
as emphasised in the beginning of this chapter, the statistician, 
while measuring dispersion, is interested in the amount and not 
in the direction of the variation and for that reason the fve 
and ve signs can be ignored. 

In the above example wc have calculated the deviations 
from the mean, but wc may also find out the deviations from 
the mode, 1 or from the median. The median is preferred to 
others for the simple reason that the mean deviation or 
average deviation from the median w ill be minimum. 

1 It it rww * common practice to record deviation* from the mode. 



228 AN INTRODUCTION TO STATISTICAL METHODS 

Sk*>s in the computation of man deviations in individual obsifva* 
lion series : 

J, Calculate the median of array. 

2. Record the deviations j d j of each item in the group 
from the median. 

3. Add these deviations, £ j d j (signs ignored). 

4. E j d j then should be divided by the total number of 
items, 


Symbolically : 

v 


MTV - 


\d\ 

:v 


Illustration : 

The following are the rents of IS houses in a certain 
locality ; 


Rs. 


Rs. as. 


c a 

5 0 

5 4 

5 8 

5 4 

4 12 

4 0 

5 0 

4 8 


6 

3 

9 

4 

4 

3 

3 

5 
3 


4 

0 

0 

8 

0 

0 

12 

0 

0 


Calculate the mean deviation of this group. 


(&. Cm.., Lucknow, t$$b). 






MEASURES OP DISPERSION 

229 

Houses 

Monthly 

rents 

Deviation from 
median 


in annas 

(78 annas) 

1 

(4- and 
48 

— signs ignored) 
30 

2 

48 

30 

3 

48 

30 

4 

60 

18 

5 

64 

14 

6 

64 

14 

7 

72 

6 

8 

72 

6 

9 

76 

2 

10 

80 

2 

11 

80 

2 

12 

80 

2 

13 

84 

6 

14 

84 

6 

15 

88 

10 

16 

100 

22 

17 

104 

26 

18 

144 

66 



2 92 


Median of th item 


764 80 
" 2 


78 as. 


Mean deviation from Median 


* 1*1 

JV ” 






2 JO A ft IirmODUCTION to htatistical methods 

Sups in Computation of M,D> from the Median in Fropmcjt 
Smts : 

\, Calculate the median of the distribution, 

2, Write down the mid-points of each class. 

3. Record deviations of each class mark (mid-value) from 
the median ( f d \ )> 

4, Multiply the frequency of each class by the deviation 
of it* class-mark from the median (/ J d j ). 

5. Add the products obtained, ignoring signs (If | d j ). 

Symbolically : M l). $111} 

Similarly the M.D. from the arithmetic mean etc:, may be 
computed. 

Example 

Calculate the mean deviation from the following data : 


.V 

/ 

0.10 

18 

10 . 20 

lb 

20-30 

15 

30.30 

12 

40 30 

10 

50-60 

5 

60-70 

2 

70 80 

2 


Solution : 


X 

mid-value 

/ 

less than 

deviation 

S\* i 



cj. 

from M.D. 

; d t 

0 - 10 

.5 

18 

18 

19 

342 

10—20 

15 

lb 

:h 

9 

144 

20.-30 

25 

15 

49 

1 

15 

30-40 

35 

12 

61 

n 

132 

40—50 

45 

10 

7! 

21 

210 

50 . -60 

55 

5 

7b 

31 

155 

60-70 

65 

2 

78 

41 

82 

70—80 

75 

2 

80 

51 

102 



m 



1,182 






MEASURES Of DISPERSION 


231 


80 

Median«• the fixe of-^th item 


**20-f* 


10 

15 


x6 


**24 

w . - 1.182 

Mean deviation — 

**14775 

The coefficient or relative dispersion is found by dividing 
the mean deviation by that measure of central tendency about 
which deviations were recorded. 

Thus : 


Coefficient of M D. » (when deviations 

Mean 


were recorded from the mean) 


M D 


° f Median ^ w ^ cn deviations were recorded from the median) 


or ** (when deviations were recorded from the mode) 

Applying the above formulae to our previous example we 
get: 

Coefficient of mean deviation«« 

-616 

Characteristics of Mean Donation The mean deviation 
method of measuring dispersion is a very satisfactory method 
m it gives due consideration to the scatter of every item in the 
•erics. Further it is based on all observations and unlike Q.D. 
it does not merely measure the dispersion of a part of the 
•cries. In economics and business statistics this method is 
commonly used. 

This method, however, lacks those algebraic properties 
which would facilitate its computation and establish its relation 



232 apt mrmooucnoN to statisticaji methods 

to other measures, It it not particularly easy to compote. It 
hat no advantage whatever over the standard deviation, to be 
diacuated now. 

Standard Deviation 

By far the most universally used and the most useful measure 
of dispersion is the standard deviation or root-mean-square 
deviation about the mean, We have seen that ail the methods 
of measuring dispersion so far discussed are not universally 
adopted for wane of adequacy and accuracy. The range is not 
satisfactory as its magnitude is determined by most extreme 
cases in the entire group, further, the range is unstable be¬ 
cause is b dependent on the item whose size is largely a matter 
of chance. Mean deviation method is also an unsatisfactory 
measure of scatter, as it ignores the algebraic signs of deviation. 
We desire a measure of scatter which is free from these short¬ 
comings. To some extent standard deviation is one such 
measure. 

The calculation of standard deviation differs in the follow¬ 
ing respects from that of mean deviation. First, in calculating 
Standard deviation, the deviations are squared. This is done so 
as to get rid of negative signs without committing algebraic 
violence. Further, the squaring of deviations provides added 
weight to the extreme items, a desirable feature for certain 
ty fies of series. 

Secondly, the deviations are always recorded from the 
arithmetic mean, because, although 21 Id! is the minimum from 
the median, is minimum when d arc measured from the 
arithmetic average. 

Titus standard deviation is the square-root of the mean of 
the squares of the deviations of individual items from their 
arithmetic mean. 

Standard deviation is always designated by the small Greek 
totter sigma (*)> This symbol was first used by Karl Pearson. 1 

1 There is a tendency to duhaguiah between the statistic observed in 
particular samples and the corresponding unknown value in the universe 
from which those samples arc dra wn, by using the Roman latter ( S} for 
the sample statistics and «fbr population parameter. 



MEASURES OP DISPERSION 


233 


Symbolically ; 


./£** 


As an illustration suppose we have the following data : 

11, 12. 13, 14, 15, 16, 17, 18 , 19, 20,21. 

The computation of o from the above data will involve the 
following calculation : 


X 

X 

r* 


il 

' —5 

25 

TV- 176 

12 

—4 

16 


13 

—3 

9 

N** l i 

14 


4 

V 176 

15 

— 1 

i 

X-. rr 

16 

—0 

0 

il 

17 


1 


18 

; *> 

4 

iv 1 10 

19 

+ 3 

9 


20 

- 4 

16 

/ V 

21 

176 

4-5 

25 

110 

G W'j 


Formula (i; can also be applied to find c in a frequency 
distribution when its form would be ; 

J~ T/?~ 

y ' "jv 

Find the standard deviation of the data in the following 
distribution : 

TABLE 11.3 







234 Alt INTRODUCTION TO STATISTICAL METHODS 


Solution : 


Jr 

/ 

fx 

(X~X) 

X 

iX~X)* 

«• 

s* 

~*T2 

4 

48 

s 

$ 

36 

13 

11 

143 

~~2 

4 

44 

14 

32 

44B 

- 1 

l 

32 

15 

21 

315 

0 

0 

0 

16 

15 

240 

1 

1 

15 

17 

B 

136 

2 

4 

32 

IB 

5 

90 

3 

9 

45 

20 

4 

80 

5 

25 

100 


100 




304 


.wri too 


X •Mp? •>*+'***+¥- 

I'i fx *) >* :M)4 


ffj 

,V 


1,500 

J Too 


15 


/A' A* > 

V 



rase y/ 


“VO 4 


*** i ■ 74 

Iii practical work it seldom happens that the mean of a 
series is a whole number. If the mean is fractional it would be 
« laborious job to square all the fractional deviation between 
the actual values and their mean. Therefore when the mean 
is not a whole number the following formula may be used : 



7 *., 2 } 


X 

TF 

11 

121 

12 

144 

13 

169 

14 

196 

15* 

225 

16 

256 

17 

289 

18 

324 

19 

361 

20 

400 

21 

441 

176" 

2,926 


r,v 


176 

It 

176 

It 


16 


a 


73 
V 


■ 2,926 

tl 


7* 


16* 


V 266—256 
■* v 10 
*3*16 







MEASURES OF DISPERSION 


23 $ 


This formula when applied to a frequency table will read a* 
under : 

imy .(I*, 


AV/r : 

ft may !xr mentioned here that formulae (2) and (2A) are 

driivni from formula (IV— 

Zx- z-.x x'r (.v,. [X.-Y'i* 

But X, ~Y}*"X l *--’2X l Y-*-Y* 

A , Y * A7 2A* t Y- Y* 

AV Y )*■■■■■ A’,* - 2Aj X+Y* 
x„ Yx X„’ 2X.Y+Y* 

Summing n)» we vrl 

s;x- xy^xx*- X2XX ' xv* 

I)ividinq hv X we ^et 

- iA -X;* ZX* » ZX . XX* 

. y y ~ A £ y y 


XX* 

jY 


X 2X I X* 


or 




T* 


Similarly, 



can l>e proved to hr 


equal to 


rijx*) 

. }T~ 



lit use can be seen from the following illustration : 





236 AN INTRODUCTION TO STATISTICAL METHODS 


Find the standard deviation of the data in Table 11.3 with 
the help of formula (2A), 


“T" 

~r 

Z3T 

— w 

12 

4 

48 

576 

13 

1! 

143 

1,859 a 

14 

32 

448 

6,272 

15 

21 

315 

4,725 

16 

15 

240 

3,840 

17 

8 

136 

2,312 

18 

5 

90 

1,620 

20 

4 

80 

1,600 


100 

1,500 

22,804 



V 


22,804 
100 

* \Z228 04 
V 3 04 
*1 74 


1,500 A 1 
l OO ) 


225 


This method also becomes very laborious when figures in 
the X column are large. A method which will be quite easy 
to work in most distributions can be stated as : 




v 


Ex’* 

' ,V 





When applied to frequency distribution its form will be 



m' 


(3A) 


Vtiititt x ttpYttcwli Acw^Uom Itoiu assumed mean. Tlhc use 
of iKe formula t* illustrated in the example below : 




Assumed Mcan« 

16 


X 

/ 

(X-4) 



" 12. 

13 

14 

15 


-t' 

>' 

/,-* 

4 

n 

32 

21 

4 

. 3 

-2 

-1 

-IS 
-33 
-64 
— 21 

64 

99 

128 

21 

ie 

* * 









inusOTun or dwpmimon 


2S7 


■SV'-VWY 

JTW 

>100 v ioo / 


.-v/4 04-l 
- V3 04 
~l 74 


.Svlt 


Zfx' 


It may Hr shown hrrr that ^ , whrrr x is rqual to (AT - A')* 

£/V 4 

<an lx* obtained hv deducting from ^ , whfir * is (X- — A), 
l jx 

the square of — y ~ * 

x A A | A A A ‘ A 
At \ 4 

( n r N ' ) Rr( Clup in 
■■ -•(' ? ! 

- *(“ )■ 

Hen<r2V 2*'* -'2*' — * N ( ^ ) 

" r-«("y*(*r 


(?/ 


2*'* / 2Y y * 

a 


Similarly, it can hr proved in the case of a frrquenc> tliitri* 
that 

JT ^ V M ) 



238 AN INT»O0VCTION TO STATISTICAL METHODS 


Calculation of a ft am frequency distribution with class intervals* 
If the dat i of the teriet are grouped into ckuttes, the standard 
deviation may be computed either by formula 



In either case the usual assumption that the frequency is 
concentrated at the mid-point of the class interval is made. 
Since the mean is generally not a whole number it is the latter 
method that is used frequently. 

Under this method we obtain deviations {x j from an 
assumed mean (or an arbitrary origin It must be carefully 
noted that the assumed mean is always placed at the mid-point 
of a class interval. 

Illustration : 


Find « of the following distribution : 


X 

j 

0 — 10 

18 

10—20 

Hi 

20—30 

15 

30.40 

12 

40—50 

iu 

50.00 

5 

00.-70 

70- B0 

o 

I 


A" 

mid 

value 

/ 

x' \ 

deviation! f% 
from f2V»j 


" O-i'O 

i 5 

18 

—20 

— 3G0 ^ 

7/200 

10—20 

; 15 

Hi 

—10 

— IbO 

1.000 

20—30 

1 25 

15 

0 

—320 


30.40 

S :v5 

12 

10 

120 

1,200 

40—50 

:: 45 

10 J 

20 

2 00 

4,000 

50- 00 

; 55 

5 

30 

150 

4,500 

60—70 

l 05 

7 

40 

80 

3,200 

70-80 

•j 75 

\ 

50 

50 



i ■ 

» 

n 


*iiV) 

24,200 















measubrs or nmnmwn 



239 


v/305:3 f 
- 17 5 

The calculation work can be very much reduced if the 
deviation* arc obtained in terms of class interval units as shown 
below. It may be mentioned that this method is the best 
easiest wherever .V is lar^c. 


,Y 

v% 

3 ■§ 

/ 

! *' 

af 

ec !J 

! 5| 

f Cr.S 

/*' 

/*■* 

0 10 

r 5 * 

la 

—2 

—36 

72 

10-20 

15 

16 

— 1 

—16 

16 

20-30 

75 

15 

0 

—52 


JO 40 

35 

12 

1 

12 

12 

■w .r>o 

45 

10 

*> 

20 

40 

'.0-60 

55 

5 

3 

15 

■ . 

00 — 70 

65 

2 

4 

H 


70-80 

75 

I 

5 

5 

ill 



79 

12 

60 

-52 

a 

242 



where i represents class interval 



wulOxl‘75 

-«7 5 


a 












240 ah rnioovcnon to *tat«tical methods 


Step* in tiie computation of standard deviation may be 
summarised as under i 

1. Select an assumed mean as some mid-point value of an 
interval near the concentration point of the data. 

2. Record the deviations (*') by writing down the number 
of classes by which the given mid-point is away from the 
assumed mean (provided the class intervals are equal). Thus 
the mid-point of the class just preceding the assumed mean is — 1 
class interval, and the mid-point of the higher class, 4- 1 dais 
interval away from the arbitary origin. 

3> Multiply each deviation (V) by its class frequency 
retaining the algebraic signs* 

4. Find the algebraic sum of /*'■ ami divide by A\ thus 


obtaining 



The square of 


SJx' 

... ™ 


will be the correction 


factor. 

5. Square the deviations by multiplying y/ by t' and add 
the product*, obtaining 

6. The result of (5) should be divided by A, thus resulting 

in - which i* the average squared deviation from the 


assumed mean. 

7, The square-root of the difference between average 

£i fa' t) 

squared deviation, i.r, # • ^• and the correction factor, i.e,, 






when multiplied by the class interval will give us 


the standard deviation. 


Thu* X Sf*? ' 

ft may be noted that ft 1 ***,** variance (see chapter 12) 
PtefHvtits t/standard dmatim. Of the various measures of 
absolute dispersion, standard deviation is by far the most 
important and is most widely used. This is due to Us various 





mrasuaes of msmsioN 


241 


properties, vis., (!) every value in the series is used in it* com* 
putation, (2) since the deviations are taken from the mean the 
sum of their squares is always the minimum, (3) unlike the 
mean deviation it is adapted to further algebraic treatment, 
(4) in the interval extending to a distance a on either side of 
the mean of a normal distribution 68*27 per cent of the items 
are included ; in the interval the mean ± 2o, 9fr45 per cent of 
the cases will tie ; and within the interval X± 3«r, 99*73 per 
cent, or nearly all of the items are included. 

C&fffifitnt of variation or relative standard deviation. The 
standard deviation that we have discussed in the preceding 
paragraph h an absolute measure of the scatter of the various 
values about the mean. This measure is always expressed in 
terms of the units of the problem which may he rupees, seers, 
inches, years, ric. This absolute measure of dispersion cannot 
be used for purposes of comparing the variability ol two or 
more series because, 

f*> the standard deviations of the two series may be in two 
different units ; and 

(ii) the means of the two distributions may be quite different. 

Thus whenever it is desired to compare the dispersions of 
two or more series which are not expressed in the same units, 
or whose means aie not the same (approximately at least) we 
will have to compute coefficient of variation foi all such distri¬ 
butions in the following manner : 


where I'stands for Coefficient of vaiiaucc 
M tl ,, Mean 

n ,, ,, Standard deviation 

Since the coefficient of variation is expressed as a per- 

rentage, ® is multiplied by 100. Since V is a pure number 
At 

it it suited for purposes of comparison. 

16 



242 ah nmoDUcnoH to statistical methods 


nkmtkn i 

The fallowing are (He •core* of two batsmen A and B in a 
•eriet of innings * 

A 12 IIS 6 73 7 19 119 36 64 29 

B 47 12 76 42 4 51 37 46 13 0 

Who is the better run-getter ? 

Who »i more consistent ? {Rqj*y 1954 ) 

Sahiium : 

In order to decide as to which of the two batsmen, A and 
B, if the better run-getter, we should find their batting average. 
The one whose average is higher will be considered as a better 
batsman. 

To determine the consistency in batting we should deter¬ 
mine the coefficient of variation. The less this coefficient the 


more consistent will be the player. 



t- r 

B 


Scores x x* 

Scores x 

** 

X 


X 


12 

- 38 1,444 

...47...14 

1VG 

115 

+ 65 4,225 

12 -21 

441 

6 

- 44 1,936 

76 43 

1,849 

73 

• 23 529 

42 9 

81 

1 

43 1,849 

4 - 29 

841 

19 

- 31 %l 

51 18 

324 

119 

4 69 4,761 

37 4 

16 

36 

- 14 1% 

' 48 15 

225 

84 

4 34 1,156 

13 - 20 

400 

29 

21 441 

0 33 

1,089 

rlf-ttxv 

17,498 

l\\ 330 

5.462 


Batsman .1 

Batsman B 


■'V 

500 r , 

330 


-.:. - 50 

10 

V“ — ort 

10 * * 


it 

j 17,498 

0 J 5.462 



V ~io 

V K) 



-41 83 

23 37 


V 

41 83x100 

.... 23 37 


50 

" 33 

A) 


— 83 66 per cent. 

*“ 70 ft per cent. 







MEASURES W DISPERSION 


24a 

^ ii a better batman since his average is 50 as compared 
to 33 of B. But B t* more consistent since the variation in his 
case is 70 8 as compared to 83*68 of A. 

Illustration > 

The following table gives the age distribution of students 
admitted to « college in the year* 1914 and 1918. Find which 
of the two groups is more variable in age ; 


Agr 

i ft 

Number 

of students tn 

1914 

1918 

15 

16 

1 

1 

6 

i i 

3 

34 

18 

8 

22 

19 

12 

35 

20 

14 

20 

21 

13 

7 

22 

:"i 

19 

23 

2 

3 

24 

3 

—* 

2f> 



27 

I 

— 


5 iiluli'.ftt : 


Age 

Assumed Mean 
1914 

21 


Assuwot Mc,j 11 
1918 

- 19 

; 4 ' 

X 

' fx 

/*'» 

/ 

i *' 

^_ a, _ 

Jx’t 

15 

V 

-6 

i 0 

0 

1 

4 

.... j\ 

36 

16 

I 

.5 

; . 5 

25 

6 

3 

! 18 

54 

17 

3 

4 

' - 12 

48 

34 

- 2 

68 

136 

18 

8 

3 

! 24 

72 

A. AW- 

\ 

22 

22 

19 

12 

.... 2 

! -24 

48 

35 

, 0 

112 


20 

14 

l 

; .14 

14 

20 

1 

i .;:i. 

20 

21 

13 

6 

'» .79 


y 

. H 

! 14 : 

28 

22 

5 

l 

: "'5 

5 

19 

■1 

5? ^ 

17 J 

23 

2 

2 

i 4 

8 

3 

_u 

12 i 

48 

24 

if 'si* 

j 

3 

j 9 

27 

147 

■ 

47651 


25 

\ 

4 

* 4 

16 



■■■ 9 


26 

0 

5 

! o 

0 


9 : 



27 

m 

6 

J_ 6 j 

36 


1 I 




H 

! gr 

299 



j 



mm 

i —51 



■i 

I 
















244 AN INTRODUCTION TO STATISTICAL METHODS 

1914 Otmv 


•Vlr ~( 


-5>y 

63 ) 


“V4T46-655 

*« 

«* 2 02 

*- S,+ ( o' ) 

**21 ~ ' 8««20 2 

v **Wi * ,(K) 


202 
' 20 2 


« 10 


1918 Gronp 


JW> ~( 

9 ’ 

> H7 A 

147 , 

v'3'3673 - 

0037 

V 3 3636 



*1 834 

* m> <( ;;’) 

* 19-06 

« 1R-94 

V 100-9 «1 

The coefficient of variation of the 1914 group is 10 and 
that of the 1918 group 9 t?8. This means that the 1914 group 
h more variable. 




MEASURES OF DISFERSO* 245 

Loresx C mrvm 

Thf Loren/ curve is a graphic method of measuring devta- 
lions from the average. It was devised by Dr, Lorenz for 
measuring the inequalities in the distribution of wealth. 
But it tan be applied with equal advantage for comparing 
the distribution of profits amongst different groups of businesses 
and such other things. Jt it a cumulative percentage curve. 
In it the percentages of items are combined with the percen¬ 
tages of such other things as wealth, profits or turnover, etc. 

AfirfAod of dramng In drawing a Loren/ curve the following 
Heps are necessary : 

1. The various groups of each variable should be reduced 
to percentages Thus if it is desired to show the distribution of 
income amongst the various groups of population of a country 
the various groups of population should be redued in the form 
of percentages of total population ; so also the incomes derived 
by these groups in terms of the total income of the country, 

2. The two sets of the percentages obtained by step 1 
should then be cumulated and cumulative percentages thus 
determined. 

3. The cumulative percentages of these two variables 
should then be plotted along the axis of T and axis of A'* 
The scale along the axis of Y begins from zero at the point of 
intersection and goes upward up to ICO, while the scale along 
the axis of X begins with 100 at the point of intersection and 
goes upto zero towards the right. 

4. The points 100* 100 along the axis of Y and the points 
0,0 along the axis of X should be joined by a straight line. 
The line so obtained is called the line of equal distribution, and 
serves as the basis for the determination of the extent to which 
the actual distribution deviates from the ideal distribution given 
by this line. 

5. The actual data may now he plotted on this graph in 
the ordinary manner and the plotted points may be connected 
by means of a curve . 

The farther the curve obtained under step 5 is from the line 
of equal distribution, the greater is the deviation. 



246 AW INTBODUCnOW TO STATISTICAL METHOD 
Illustration : 


The following table given the population and earning! of 
two town# A and B. Represent the data graphically so as to 
bring out the inequality of the distribution of earning!. 



Town A 

I 


Town B 


Persons 

Earning! 


Person 


Earning (Daily) 

too 

75 


50 


80 


100 

100 


70 


120 


100 

150 


30 


60 


UK) 

225 


25 


140 


100 

325 


100 


200 


100 

375 


45 


200 


100 

450 


30 


140 


100 

600 


m 


460 


100 

850 


20 


120 


100 

1,850 


50 


4 BO 


1,000 

5,000 


500 


2,000 


■Solution : 







Table 

showing the 

populations and daily earnings of two towns A 

and B. 








Town A 



Town R 


Prisons 

Earnings 

Person! 

Earnings 

IS ' t) 


V .« 

li 

v . 


t 

; > 

& .s 

> L 

•- K * 


& 


.3: & 

5 : ,5 

s ; « 

5 5 

Jt 

« 5 

i *S 

S 2 

*3 — ; 3 
S3 1 8 

C! 

W E2 

w . s Ir 1 

*51 c 

5! t* 

j* 

u 

2 C 

S “ 

; e 1 

s £ 

B £ 

j Oil Ur- 

«r 'Aj 

UCl 

U H 

t 

w Cl 

U H 

w 0 l 

~WT III “ ' 7T 

15 

' 50 

10 

80 

4 

200 20 175 

3 5 

120 

24 

200 

10 

300 30 235 

6-5 

150 

30 

260 

13 

400 40 550 

11 

175 

35 

400 

20 

500 50 875 

17 5 

275 

55 

600 

30 

600 60 1,250 

25 

320 

64 

800 

40 

700 70 1,700 

34 

350 

70 

940 

47 

800 80 2,300 

46 

430 

86 

1,400 

70 

900 90 3,150 

63 

450 

90 

1,520 

76 

1,000 100 3,000 

14X> 

500 

100 

2,000 

100 





MEASURES OR DISPERSION 


247 



Fig. II. 








248 AN INTRODUCTION TO STATISTICAL METHODS 


EXERCISES 


1. What do you understand by dispersion ? State the important 
measure* for computing if and point out their merits and demerits. 

2. Explain by giving suitable example* the need for preferring relative 
measures of dispersion to absolute measure*. 

& Describe the methods of drawing Lorens Curve and explain its useful¬ 
ness in statistical work 


4. 


5 


ft. 


Point out the range and it* coefficient in the following questions of 
chapter on Central Tendency : 

O. Sor 8, 10, 14, 19, 22. 

13,1) 


Calculate the mean deviation and it* coefficient about mean, mode 
and median for the following tiguirs giving the monthly salaries of 
some persons. 


103, 50, m, no, 108. 105, 174, 103, 150, 200, 225, 350 and 103. 

13.4) 


Compute mean deviation of the two sene* and point out which is 
more variable. 


7, 


H 


9, 


10 . 


U 


I 


rt c 

y * 

A —v 


| e- 

< -r. 


1 1 i 


& s 


1. No, Calcutta 93, 97. 95, 95, 95, 95, 97, 97. 92, 93, 89, 89. 
L No. l>elhi 107, 10B, WJ. (02, 102, 104, 107, 105. 102, 100. 97, 96. 

J Cw., *95?) f3,5] 
Calculate a) Median coefficient of dispersion and (b) Mean coeffi¬ 
cient of dispersion from the following data : 


Site of item 4 6 0 10 12 14 16 

Frequency 2 4 5 3 2 1 4 


{M A , Agra, i$5f) (3 6] 
Calculate the mean deviation* from the following data. What light 
doe* »t throw cm the social conditions of the community ? 

Ih^mnut in a£t faitttwn husband and wif* 

Pif. in year 0-3, 5-10. 10-15, 15-20, 20-25. 25-30, 30-35, 35-40 

Frequency 449, 703. 50?, 28J # 109. 32, 13, 4 

(8 .CornBemkq?) (3.9) 

Compute the mean deviation from the Median and from the Mean 
for the full owing distribution of the stores of 50 college students. 

Score* 140- 150— 160*- l TO- 180— 190— 200 

Frequency 4 6 10 18 9 3 


(jW.Caa., Ban at as, 1957 ) (3.7] 
Find the mean deviation about mean of (he following data of ages 
of married men in a certain country. 

Age 13-24, *25-34, 35 44, 45-54, 55-44, 65-74 

No. of married ineo 33 264 303 214 128 38 

{*•1 

CakiiUie quartiie deviation and quartile coefficient of dispersion for 
question nos. U, 28 and 31 of Central Tendency . 



MEASURES OF DISPERSION 


249 


12 Calculate the quartile deviation for wage*: 

Wage* in K» 50-32, $2-34, 34-36, $0-30, 3M0, 40-42, 42-44 
Labourer! 12 16 16 14 12 0 0 

'<*!>) P-») 

13, Following figure* give (he marks of student* in « class test. Kind the 
rumiard deviation of the mark*. 

14, 15, 2.3, 20, 10, 30, 19, i». 16, 25. {311) 

14. flak-,date iOr mean and standard deviation of the following value* of 
the world's annual gold output m million* of pound*) for 20 different 
years. 

04, 95, 96 93, R7, 79, 73, 69, 6ft, 67, 78, 

R2. 83, 89, 95, 103, 108, 117, 130, 97. 

Abo calculate the percentage of ca*e* lying outside the mean at 
di starcr ‘‘f '•n, ;f 3o, where o denote* standard deviation. 

Hear., ftyji p) (3.12) 

15 Calculate the standard deviation from the following data : 

‘•Ore of the item 9 7 ft 9 10 f 1 12 

l'ic<|ucfuy 3 6 9 13 8 5 4 

(/* On.,,.Yagpur, r ygy) 13 ■ 14) 
16, From tk f lUiwmg informal wn a (tout the accident* on a road in 20(1 
days, calculate the nvan number uf accident* and the variance of 
accidents : 

No. of accidents per day 0 1 2 3 4 5 

No. of day* 16 7(i 38 25 10 5 |3.15) 

1 ? The following h * tandom sample horn * given population. Compute 
the Anihmr’ic mean and the r*v»i!ic»ent nf variation. 

Lower Claw Boundary 9 5, 14 .5, 19 .5. 24 5, 29*5 

Upper CU» Boundary H i, 19 5. 24 3, 29 5, 34*5 

Frequency 1, 7. 5, 5, 2 

i Si Ctum ., Itanaras, (3.16) 

18 Calculate standard deviation for Q No. 39 of Central Tendency. 

(3.I7J 

(9. Compute standard deviation of examination mark* of the following 
100 student* : 

Marks 10-20, 20-30, 30-40, 40 50, 60-70, 70 B0, 80 90 

No. of student* 19 3 2 49 24 2 0 1 

B,Cam , fUiht, I#?,*} (3.18) 

20. Calculate mean ami standard deviation fw the following data : 

Age under 10 20 30 40 V) 60 70 80 

No. of person* 15 30 53 75 100 HO 115 125 

(am., %, 1957 ; \%m 

21. Tire folkwing gives the distribution by star of 40 different farms 
selected at random from SOI) farms. Obtain a rough estimate of the 
standard deviation of the acreage of 800 farms. 

Farm acreage. 

nearest acre 1-10, I l-50 t 51 -100, 101-200. 201-300, 301-400, 401- 500 
Nt>. of form* 13 9 0 7 4 5 2 

(M.A., &*Uu t 1954) (3.21J 



250 AN INTRODUCTION TO STATISTICAL METHODS 


22/ Calculate standard deviation and coefl&dent of variation from the 
following data giving the age dbuibutHm of 542 member* of Houie of 
Common*. 

Age 20-30, 30 10, 40-50, 50 60, 60-70, 70-80. 80-90 

No. i»f 

Member* 3 61 132 153 140 51 2 

(B.C&m.. Iklkt, t 9 55) (3 20) 


23 . 


24 


From the following information regarding the mark* obtained at 
college and the competitive examination*, find which group i« more 
homogeneous in intelligence . 


College Exams. 

Competitive Exam* 

Marks 

No- of Students 

Mark* 

Si*. of Students 

100-150 

20 

1200-1250 

.50 

150-200 

45 

1250-1300 

H5 

200-250 

50 

1300-t350 

72 

250*900 

25 

1350 1400 

60 

300-350 

19 

1400-1450 

16 



B.Cvm-, Agra, (3.35) 

The marks obtained by the student* of class A dt ft arc given below . 

Mark* 

5-10. 10-15, 13- 20, 20-25, 25-30, 30-35. 35-40, 40-45 

Class A 

1 |0 20 

8 6 

3 1 

Class ft 

5 6 13 

10 5 

4 2 2 

Calculate mean, mode. median 

ami standard 

deviation for the 

dtotributkm. 

Explain your result* regarding composition of the dais 

in respect of intelligence, 





(ft Cam., 

fulfil, tyu) !<3yi 


25. From the price of shares X and I given below, state which i* mot e 
stable in value ; 

X 55. 54, 52, 53, 56. 56, 52, 50, 51, 40 

r 108, 107, 105, 105, m , 10 ", 101 , 103. 104, 101 

(B Com -, /fo*#r*r, [3 34j 

26. (a) CcKiffieienlft of variation of two series arc 58% and 69%. Their 
standard deviations are 21 2 and 15 b. What are tbeir arithmetic 
mean 7 


(b) When can coefficient of variation be greater than 100% ? What 

can you say about ihe items in such a case ? f3.4UJ 

27. (a) Mean of 100 item* n 50 ami their standard deviation n 4, hind 
the sum of squares of all the item*. (3.44) 

(hi Mean and standard deviatwmt of two distributions of 100 ami 150 
item* arc 50, 5 and 40, 6 respectively Find the mean and standard 
deviation of all the 250 items taken together. [3.45] 

(c) In two distributions of site 40 and 50 sum* of squares of the. devi¬ 
ation of items from their respective mean are %0 and 800, Calculate 
the standard deviation of the two distributions. 


28, The mean, standard deviation and range of a drtmbuikm of weights 
of 12 children are 9lb f 2 lb, and 6 lb. respectively, The median of 
ihe distribution is the same as the mean. Find the mean standard 
deviation of the group if the lighted and the heaviest children are not 
tatcea into account [3.4?) 



MEASURES OR DISPERSION 


251 


20. The mean and the standard deviation of a sample of 100 observations 
was calculated a* 40 and 5 1 rraprctively by a student who took by 
mistake 50 instead of 40 for wit observation. Calculate the correct 
mean and standard deviation of the sample. p 44) 

30. A distribution consists of three component* with frequencies 200, 250, 
300 having mean 25, 10 and 15 and standard deviation 3, 4, 5 rrspec* 
lively. find the mean and the standard deviation of the combined 
distribution. 

{M-Com& unarm, ttf$4) (3*46) 
HI The following it a record of the number of bricks laid each day for 20 
day* by two brick'layers A and B. 

A : 725, 700. 750, 630, 675, 725, 675, 723, 625, 675, 700, 675, 725, 
675, 000, 650, 675, 625, 700, 650. 

H :• 575, 623, 600. 575, 675, 625, 575, 550, 650, 625, 550,700. 625,600, 
625, 050, 575, 675, 625, 600, 

( ‘ah ulair. the coefficient of variation in each case and discuss the 
relative consistency of the two brick'layers. If the figures for A were 
in every case 10 more and those of B in every case 20 more than the 
hgurrl given above, how would the answer be affected ? 

A i,Cw.< BaKorai t #^ 0 ) (3-43) 

12. Art analysts of the monthly wages paid to workers in two funis A and 
B be Wiping }n the same industry gives the following results : 

Firm A Firm ft 

N'> */i wage. 506 646 

Avera^r monthly wage Ks- 52 5 Rs. 47*5 

V'nr»,uo.:r of distribution of wayc* 100 121 

fai Which firm, A or H pavi out the larger amount of monthly wages? 

• b) In which firm, A nr B, is therr greater variability iit individual 
wages ? 

(r.) VVhat are the measures of average monthly wage and the vari¬ 
ability in individual wage* of all the workers hi the firms A ami B 
taker* together ? 

I A S., 1331) (3.48) 

3J. A collar manufacturer h considering the production of a „tew style of 
colbis to attract young men. The following statbtio of neck circum* 
former are available based on measurement of a typical group. 

Mid value tin inches) 12 5, UH\ 13 5. 14 0, 14 5, 150, 15*5, 16 0, 16 5 
.No of students 4 19 30 63 66 20 IB l 1 

Compute the standard deviation artrf use the criterion A'iSu to 
obtain the largest and smallest tire of collar he should make in order 
to meet the need of practically all hi* customers bearing in mind that 
collars are worn on an average 3/4 inches larger than neck sir.e. 

(3.3BJ 

34. Follow ing table gives the weight in pounds of the student* of a school. 
Find the coefficient of variation for the distribution : 

Tens i 0 1 2 3 4 3 6 7 8 0 

w— t 3 i — 1 2 1 — — r 

90 5 4 5 2 861253 

100 8 5 2 4 2 5 3145 

JIO 7 5 3 3 2 3 2 4 2 1 

120 21 I 23 — — 41 1 

( 3 . 49 ) 


Chapter 12 

Skewness, Moments and Kurtosis 


U p to this point two attributes of frequency distribution, its 
central tendency And its dispersion, have been explained, 
These two types of attributes tell us a good deal about the data 
from which they arc computed. The so-called 'averages’ repre¬ 
sent the data in a condensed form. The dispersion of the data 
teilft us (he extent to which the itciro differ horn the centlal 
tendency. But none of these attributes tells us about the form 
or type of the distribution, i.e., whether the number of items 
about the mode exceeds that below or via versa , 

Type of distribution 

By the type of distribution we mean the approach to or 
divergence from symmetry. In a symmetrical distribution, if 
the frequency curve is folded on the ordinate at the mean, the 
two halves of the curve coincide, In such a distribution, the 
mean and median arc identical and 50% of the total items lie 
above the mean and 50% below it. If the distribution has only 
one mode then this measure will also coincide with the mean 
or median. If the distribution i* bi-modal (having two modes) 
then these will be symmetrically placed on both side* of the 
mean Any distribution which is not symmetrical is asym¬ 
metrical or skew. 

The following are the various types of frequency distri¬ 
bution : 

1. Normal dutrtbutum. Its curve is called normal curve or 
normal curve of error. Every normal distribution is a symme¬ 
trical one but every symmetrical distribution is not normal. 
The symmetry of the distribution is the first requisite of 
normality. 




SKEWNESS, MOMENTS AND KVBTOSIS 259 

2, Asjmmttrual distribution in which the frequencies of 
the dan intervals decline more quickly on one side of the mode 
than on the other, 

3, J - shaf^td distribution in which the frequency curve is 
of the shape of J or i.e., the frequencies of the class inter¬ 
vals either decrease throughout or increase throughout as the 
value of the variate increases. 

4, U-shaped distribution in which the frequencies fall to a 
minimum level and then rise* the frequency curve being thus of 
the shape of the letter *U\ 

All these types of distribution are illustrated below. 

The following table shows the heights of the students of a 
college : 


TABLE 12 1 


Class 

A 

J 

B 

/ 

c 

/ 

]> 

/ 

E 

J 

F 

/ 

5*6-5— e »8 5 

5 

3 

0 

4 

5 

29 

f.tt.V -60 r» 

25 

r 

4 

ft 

3 

17 

an. 62-5 

15 

VO 

40 

20 

5 

ft 

02*5..04 5 

10 

4-1 

24 

24 

10 

0 

Of 5 • .00-5 

15 

Hi 

20 

40 

15 

10 

00 5 - OB’ 5 

IS 

5 

ft 

4 

20 

15 

f j8 5 ■—70*5 

5 

3 

4 

0 

42 

21 

N 

100 

100 

100 

100 

100 

100 

Mean 

63-5 

03 5 

63 5 

63 5 

65*6 

63 

Median 

63'5 

63 5 

63 

64 

07 7 

61*5 

Mode 

— 

63 5 

61-9 

65’ 1 

692 








254 AW INTRODUCTION TO STATISTICAL METHODS 

The histogram* and the corresponding curves arc drawn in 
fig, 12*1 ; 12 2; 12.3. 



Fig- 12.1 

A glance at these figures reveals that curves A and B arc 
symmetrical whereas turves C, D, 1- and F arc asymmetrical. 
Histograms and curves A and B arc symmetrical and median 
ami mean in each coincide. Ami if we fold curve A or II and 
histograms A and B on the ordinate at die mean, the two halves 
of the curve or histogram will coincide. Now curve B is a 
normal curve whereas curve A is not, for the simple reason that 
curve A is biinodal (having two modes) whereas curve B is 
uni modal (having one mode). The curve A has two modes 
which arc symmetrically placed on each side of the mean* 
Curve B is mound shaped, that is, it lias fewer items at the ex¬ 
tremes and more items towards the centre. In curve B all the 
three measures of central tendency coincide. 









SKEWNESS, MOMENTS AND KtmTOSlS 255 



Fig. 12.2 

Curve E and histogram E indicate j shaped distribution and 
curve F and the corresponding histogram F exhibit U-shaped 
distribution. Curves C t D and E are asymmetrical or skew, 
Because in histogram C, Mean it greater than Median and 
Median it greater than the Mode and in histogram* D and E 
Mean is less than Median and Median is less than Mode, all of 
them are skew- But histogram C is positively skew whereas 
histograms D and E are negatively skew. In histogram O, the 
three measures of central tendency differ from one another ; 
the mode is 619, the median is 63 and the mean is 63’5. 
The extreme variation* towards the higher values give the 






256 An ucrnomcrtun to otatioticai. methods 


histogram a longer rail to the right and this pulls the median 
and mean in that direction away from the mode. 



Fig. 12.3 

In histogram D, too, the measures of central tendency differ 
from one another, but here the mode is the highest. The 
extreme variations towards the lower values give the histogram 
a longer tail to the left and this pulls the median and mean in 
that direction from the mode. 

In skew distributions, the lower and upper quarters are not 
equidistant from the median. Similarly corresponding pairs of 
deciles are not equidistant from the median, But in a sym¬ 
metrical distribution the numerical distances from the median 
to the lower and upper quart iles are equal, 

From the above discussion, we can summarise the tests for 
the presence of skewness in the following words : 

1. When the graph of the distribution does not show the 
normal bell shaped, symmetrical curve ; 





MEWNEHS, MOMENTS AND K MITOSIS 257 

2. when the three measures of centra! tendency difler 
from one another; 

3, when the sum of the positive deviations from the 
median are not equal to the negative deviations front the same 
value ; 

4, when the distances from the median to the quarters 
are unequal ; 

5. when corresponding pairs of deciles ot pert entiles ate 
not equidistant from the median. 

Measures of Skewness 

When a distribution is asymmetrical we usually call it a 
skew frequency distribution and the measures of asymmetry are 
usually called measures of skewness. Measures of skewness 
exhibit the position relative to the median or mode at which 
the items are pulled away, or diverted from symmetry. The 
measures of skewness show the manner in which deviations are 
distributed, i,e., the direction of the deviations. The usual 
measures of skewness are applied to those frequency distribu¬ 
tions, where a mode is present* and the distribution of items 
around it has a tendency to be regular and systematic hut there 
is not a perfect balance on both sides of the mode. It is with 
this type of curve that we will deal in the following discussions. 

Various measures of skewness have been pul forward and 
none has been uniformly employed. For this reason when one 
gives a measure of skewness, it is advisable to tell the way in 
which it is computed, The usual methods an* discussed below. 

Skewitfsi mraNored by Rdationship between 
3 MTt of Central Tendency 

As explained earlier, in askew curve the three M’s of 
cenfial tendency differ. Thus it is possible to measure the 
absolute amount of skewness by noting the amount of the 
divergence between the mean, the median and the mode of the 
distribution. That is, the degree of asymmetry is indicated by 
the extent to which the three averages differ from each of fur. 
We have seen earlier that mode is not affected by tire size or 
number of deviations above or below it. In other word*, the 
mode will remain at the point of greatest frequency, whether 
17 



258 AH INTRODUCTION to STATISTICAL METHODS 


or not the curve it symmetrica!. The median of a distribution 
is always affected by the number of deviations but not by their 
site. The mean of a distribution is affected by the sire as well 
as the number of deviations. In a perfectly symmetrical 
distribution, '"the errors in excess and those in delect of the 
mean are equal in extent of deviation; those which are positive 
cancel those which are negative/* with the result that the mean 
coincides with the mode. In a skew distribution, on the other 
hand, the mean may be greater or less than the mode depend¬ 
ing upon the direction of asymmetry, (When'ikewness is 4 ve 
the mean will be greater than the mode. When skew ness is — ve 
the mean wilt be lew than the mode.) Skewness may, accord¬ 
ingly, be measured by the difference between the mean and the 
mode : 

Absolute Skewness- Mean Mode. 

( f) or (~) signs obtained by the above formula w ill exhibit 
the direction of skewness and the difference will reveal the 
amount of absolute skewness. This is a measuir of the absolute 
amount of skewness but bow significant is this amount ? One 
device is to convert the absolute skewness into relative skewness. 


And for this conversion we will have to compare the displace¬ 
ment of the averages with some standard measure of dispersion, 
Karl Pearson has given the following fcwnmhi for measuring 
relative skewness. 


SK*» 


3? Motif 


For concrete data, however, the mode it not easily located 
and is so much affected by grouping mots, that it becomes 
unreliable-. Further different methods of locating mode will 
give different conclusions. In a sample distribution, too, the 
mode is difficult to locate and is subject to wide Hurt nations of 
sampling. For these reasons the above formula doe* not seem 
sat if facto* y and is of line value as a description of observed 
data The difficulty ran, however, lie overcome in the follow¬ 
ing manner, 1*. a unimodg! series, when the asymmetry is 
not to great, the mean tends to lie about three times as far 
from the mode at from the median, i.r. ( 



&KSWHKSS, MOMENTS AND EURTOSIS 


259 


Mode «* Mean - 3 (mean—median) 

or Models Median 2 Mean 

Substituting the value of mode at above in our formula, the 

skewness can also be measured as : 

„ _ _ {*- (3 Median ~~U ) ) 

Coefficient SK«-- —z ---— 


3 (X— Md) 


Ulus trait an : 

Find the skewness from the following data ; 


TABLE 12.2 


Height in 

Number of 

inches 

persons 

58 

10 

59 

16 

GO 

30 

61 

62 

42 

*» r. 

hi 

V U 

28 

64 

16 

65 

8 


Solution ; 


Height is a continuous variable, and hence 58* must be 


treated a% 57 5*— 

58 5* 

; 59* as 58 5' 

59’S 1 

* and 

so on. 

Height 

/ 

X* 



Cumulative 

in inches 

from 61 

frequent'y 


58 

\0 

. 

- 30 ' 

90 

10 


59 

18 

-*2 

-36 

72 

28 

595* 

-60—60-5* 

30 

-1 

- 30 

30 

58 

60 5' 

-61-61-5' 

42 

0 

-96 


100 

G25*. 

62 

35 

1 

35 

35 

135 

-63—635' 

28 

2 

56 

112 

163 

635' 

—64—64*5' 

16 

3 

48 

14* 

179 


65 

8 

4 

52 

128 

187 



187 


171 

611 



+ 75 


260 AN INTRODUCTION TO STATISTICAL METHODS 


75 

Mr,in --61 i -,-gy 


61 4 

Modi* 60 34 ' 
01 CH 

„ J '■» 


•15 

65 


V ~ Y 

* In? ( IH7 J 

, v:vi7 If) 

\ Til 

■ffTft i 7b 

Skrwjsrst 614 f» J 04 

O 'V* im:h?s 


Curtin irrw of Skrwnrs* • 


•3f> 
17 b 


• > r > 


Median 


!hr *i/<* of ,> c h ifrrn 


-« f n 5th item 

I / VvS 

-^5 ! ' . 

or:i;> 

Kkr%v t.r*^ {(> M b 1 • 55 • 

• 3((>M 

•n 

... . 15 

CorI hr i<mt 1*1 Sk ru ness -■ , - . 

i /*} 

■ m 

Cfimrtik Momure of Skewne«« 

In the above two ;nfi }>»■<!<• »?f r measuring dtrwnti*, the whole 
of (hr series is taken into comideialion, Hut absolute its well 
a* relative skewness may hr secured even for a part of the 
scries, The usual device n to measure the distance between 
the lower and the upper qnartilrv In a symmetrical series 



SKEWNESS, MOMENTS AND KtRTOSlS 261 


the quartiles would be equidistant from the value of the 
median, i.e., 

Median -••• Q, -- Q * Median. 

In other word* the value of the median is the mean ufQj 
and Q s . In a skewed distribution, qua*tih 5 would not be 
equidistant from median unless the entire .*>\mmctry is located 
at the extremes of the series. Bow ley has Molested the follow¬ 
ing formula for measuiing skewness, basal on above fan* 

Absolu te SK ( ) 3 M ed - ■ (Med ■ Q ,) 

*4.3 ■ Q» -2Med. 

If the quartiles air equidistant from the median, nr., 
(Q^Mrd) - (Med Q 4 }, then SK 0. 11 the distance from 
the median to Q., exceeds that from to the median, this will 
give a negative skewness, if the reverse is the ease, it will give 
a positive skewness. 

If the series expressed in different units are to be compared, 
it is essential to convert the absolute amount into the relative. 
Using the interquartile range as a denominator we have for 
the coefficient of skewness the following : 


Relative SK — 


or 


Q 3 4 Q« ~2Mrd 
Cl s .Q.i 

(Q, Med * (Med Q ,) 
^ (Med (V,) 


If in the series the median and lower quartilc* coincide 
then the SK become* (+ 1). If the median and upper quart ties 
coincide then the SK becomes (■ ■ I), 

This measure of skewness is rigidly defined and easily com¬ 
putable Further such a measure of skewness Iras the advantage 
that it has value limits betw een ( f 1) and (— I), with the result 
that it n sufficiently sensitive for many requirements. The only 
criticism levelled against such a measure is that it does not 
take into consideration all the items of the series, i.e., extreme 
items are neglected. In measuring skewness we are interested 
in the extreme items also. To achieve this end Rowley's 
measure car: be enlarged by taking two deciles (or percentiles), 



262 AW INTRODUCTION TO STATISTICAL methods 


equidistant from the median value* Kelly has suggested the 
following measure of skewness : 

SK -P w ..2. 

n n\hD,i 

.' 2 

Though such a measure has got little practical use, yet 
theoretically this measure semis very sound. 

So far wr have discussed three methods of measuring skew ¬ 
ness There are two mote methods : yij based or* third mo¬ 
ment, : it) based on Beta coefficient. These two methods of 
measuring skewness will he explained, after the explanation of 
Moments of Dispersion* 

Computation of skewness based on the position of 
quintiles 

In the illustration on page 259, 

(£, the sire of ^ th item 


4fr75il* item 


■ 59 5 i-*63 

-60*13 


Q ;i " the si/e of j th item 


M025th itrm 


**62*3+ ’ 28 

62*34 19 
- 62 69 

Skewness ~ • 62*60i-60 13- 2 (61 * V>) 

* *12 

12 12 

Coefficient of skewness^iFST* 5 *'^ 




SKEWNESS, MOMENT AND KUftTQSIS 


263 


So far we have discussed that a frequency distribution may 
be characterised by its mean, median, quartiles ; its range and 
standard deviation ; and its skewness. All these measures of 
central tendency, variability, and skewness, may be grouped 
under iwo main heads: (!) Pricentile system ; and (2) Moment 
system. 

The family of percentile system includes median, quart lies, 
deciles, percentiles, etc. All members of this family are 
measures obtained by computing a value which corresponds to 
a given proportion of frequency. 

The family of moment system includes mean, average 
deviation, standard deviation and the like. All members of this 
family are measures based on deviations of items from some 
origin The notion of 'moments’ in statistics is analogous to 
that in mechanics. In mechanics 4 Moment* is a measure of 
the tendency of a force to produce rotation. The tendency to 
produce rotation depends upon ; 

(I) The magnitude of the force, 

12) The line of action of the force. 

I*rt us have a yardstick AB {marked in inchest which is 
free to turn about some fixed point D called the fulcrum. 
The tendency of the stick to rotate depends upon the magnitude 
of force and the distance from the fulcrum to the point at 
which the force is exerted. 

A C a € 8 


r . -r 

1 

[«'. S' '* 

t 

..—* 

i 

» 

« . *8 ■ d 

♦ 

t 

(S££# 

[j*A E3 



Fig. 12,4 

Suppose we placed a weight of I seer at C. 9 inches to the 
left of D, Obviously we could keep the yard from rotating if 





264 AN INTRODUCTION TO STATISTICAL METHODS 


we piaccd a weight of 1 leer ac E 9 indies to the right of D. 
We could also keep the yard from rotating if we placed a § 
seer weight at B, IB indict to the right of 1). 


FULCRUM 

♦ 



Fig. 12.5 

In the language of mechanics the tendency of the force to 
produce rotation is measured by the product of the magnitude 
of force and the lenth of the perpendicular from the fulcrum 
to the line of action of the force. In the above illustration ; 
the magnitude of force to the right of D at H ' } seer, 
the magnitude of force to the left of D at G l seer, 
the distance from B to D IB inches 
the distance from G to D*®9 inches 

If the yard is in equilibrium, the tendency to rotate towards 
the left must be equal to the tendency 'to rotate towards the 
ught, i t, |xDB«l xCD or |xlB«>Ix9. This statement 
would be correct whatever be the amount of force. 

If a number of forces, /t./s./i,—at distances x t , are 
applied, the moment of the first force about the origin is/*x, ; 
the, moment of the second force is/»jr,; and soon. These 
moments arc additive so that T/a is the total moment about 
the origin, If the total moment is divided by the total force, 
the quotient is termed *a moment coefficient 1 . The formula is 
T/*x 

Y . where A**T/, ne. t total force. 



SKJEWNESS, MOMENTS AND KORTOS1S 265 
TABLE 123 


Class 


Frequency 

0~ 5 


4 

5-10 


8 

10—15 


20 

15—20 


24 

20 25 


40 

25-—30 


4 

30—35 


0 

AT** 100 


Imagine that of the above data a histogram has been cut 
of material having weight, and that it rests upon a lever-arm 
with fulcrum at zero, as in fig. 12.0. 



* 9 iO 94 4Q 4 0 


w Qo 250 420 Woo aa c> 

Fig. 12.6 

Force/, (4) is at distance *, (2 5) 

Force /, (8) is at distance x, (7 5} 

Force/* (20; is at distance x % (12 5) 





266 AN INTH0DI7CTI0N TO STATISTICAL METHODS 

Force f n {0} it at distance ** (32 5} 

Thus the moment of the lit force »■■ 10 

„ „ „ „ it 2nd force ** 60 

M f , r * „ „ 3rd force **250 

*( it m m *» 7th force 0 

Total Moment about the origin - 10f 60 f25<i f 420 \ ttOO 

-f-1 KH 0 i f 750 

f oul force £f** 100 

w it , Ifx 1.750 

*. Moment roellictent ■“.-v, 

J* I ou 

- 17 3 

Thus 17 5 is the turning tendency about O of the entire 


distribution. This is also called the first moment about zero. 

Suppose now that the fulcrum is moved to the point corres¬ 
ponding to the mean average. The lever will be balanced, 
the positive moment equal to negative movement as in fig. 12 7. 


\ 

\ 

\ 

i 

!-1 

H ! 

_1 i 

i | 

| 

! 

' 

r 

! 

4ZZj 

! ! 

1 

- 

i 



-,-1-;- T -1-1" 

e ; 9 i i /s } £ 

\ ! 1 l 

* i cs i &? 1 +0 

< » i \ 

£5 

i » t 

*S te'S its 

££ S 

9TS SsS STS 


-tp -s o 

*5 

+ 9 + IS + 20 

i 

» SO 94 

40 

4 0 


BO - tC J O *JV9 

Fig. 12.7 

+ 40 0 






SKEWNESS, MOMENTS AND KVBTOSJ8 267 


The formula for the ‘moment coefficient’ about mean 

... Zfx 
would be : -~~ 

JV 

It will be recalled that formula for computing the arithmetic 

1'fx 

mean is ;. which is the same as for moment coefficient 

around zero. This identity has led statisticians to speak of the 
mean as. the ‘first moment about the origin*. In fact mean is 
a mnmnlt coefficient and not a total moment, But in statistics 
i lie distinction between total moment and moment coefficient 
is neglected and mean is often called the 1st moment. 

In general an expression of the form , where x repre¬ 
sents distance from Arithmetic mean, is represented by the 

VfX 

Gn e It Inter » Mu). for expression of the form of v. 


where A* represents distance from any point* is represented by 
the Greek letter v (Mu). 

The concept of moments may be extended to higher powers. 


So in statistics 


£(/**) 

.jiT 


is termed ‘the second moment about 


JL7 fx % ) 

the mean* and - is termed the third moment. 

JV 

Moments can be calculated from either of the following 
three f>oinu ; 

(l j from (true) mean, where nth moment about the 
mean is— 

Zf{X~X)\ Xfx* 

Jf ' "M 

(2} from the origin taken at aero, lc M where the assumed 
mean A 0, where nth moment about aero is— 

, ZflX- 0)* W 

v , m srz - JL, .u* — 



268 All INTRODUCTION TO STATISTICAL METHODS 


(3} from any arbitrary origin or summed mean, where 
nth moment about any arbitrary origin ii— 

I'/tX-AS £/{*T 

v * "..V ..jV ' 

.... £( f*> 

lhut a, -•—— 

E(.h*) 

“* ’’. y '' 

„ -(A*) 

y 

.y 

Where *■■-.»( A'—,V) 

The momenu about zero are : 

. E(/X) 

v mn. 

• . A 


, £(JX‘) 

jjt— 

4 ,V 


v - *</*•> 

y 

Moments about any arbitrary origin are 




SKEWNE8S, MOMENTS AND KITRTOSI8 269 

r/v* zj{x-A)* 

Vi ^‘" a 7 "".:v.■ 

£/*'• mx-A)* 

" • y. 

The first moment about the mean is the total of‘the devia¬ 
tions from the mean multiplied by their corresponding 
frequencies and divided by the total number of cases*. To 
obtain the second moment the deviations are squared before 
adding To obtain the third moment the deviations arc cubed 
I it; for** adding. To obtain the nth moment the deviations are 
raised to the nth power before adding. We have already seen 
that the 1st moment about the mean is zero. About other 
m>uttents around mean, we can generalise , 

\-V In all frequency distributions ' 

*V-I 

**i ■ <> 

«■ ■- <T* 

IT. In symmetrical distributions : 

-0 

<V b 

»V 0 

Relation between m and v 4 

If we are given the moments about arty arbitrary origin 
i .i htditjg 0; tlicn wc ran compute moments about mean by 
f following formula : 


* l i»f* tnrlhfrtJ of r.on»|>ulin#>mr e nt* almtn tfjr men n f**m» fnomtfsO 
ihr arbitrary oiivin ran l>r r;»uily trrrtrml*emi )>> imrlrnu nulling 

Uk Mowing : 

;if ■! t ■ >. i- f»* ■; v, 4? where 4 A) 

Zf*') 

" “ .V.** 

'■ s g </>'■ -v>»j-v.4f 2 

■’ . * "■ *i .V,* 

v., *VrM ■- rf* 

' 1 ■Sv-jjVj 4 X* 

* i ! v ■ -4# v 1 - v | ■ • 4\> j <f* 4 «f 4 

. V| ■»' t*y 2 jr| 3 I 4 




270 AH IHTHODUCTIOH TO STATISTICAL METHODS 

T^W t ~v't~0 

* g W t ~ {v',) # 

—V,H vVf 2 ( v 'i)* 
v * 4 - 4v' t x v'|4-■€» t X (v f ,)* — 3{v', ) 4 
An important corollary follows from the above* be., the 
mean square deviation about the mean of the observations is 
less than the mean square deviation about any arbitrary origin. 
In other words, the mean square deviation or variance (**) about 
the mean is minimum—smaller than it would be if computed 
from any other average. So from the equation, since v\ is 
positive quantity, being a square, m, must be less than v' r 

Measures of skewness based on moments 

It will be recalled that in symmetrical distribution, all the 
odd montents about mean, i.e, ** 3l a*.. etc , are equal to 
aero. If the odd moments (other than **,) are not equal to zero 
then it means that the distribution is skewed. But the compu¬ 
tation of odd moments alone is not a satisfactory method of 
measuring skewness. To exhibit the degree of asymmetry we 
must relate these moments to the standard deviations. Thus 
the various moments divided by the proper power of the 
standard deviation give us another family of useful coefficient 
which we denote by the Greek letter x (alpha'). 

Symbolically : 

(Alpha erne) a, - -A () j (Alpha two) b q ; «■-1 

I 

(Alpha three) a t I (Alpha four) a 4 U ^~ 

! 

(Alpha five) « 6 j 

i 

t These wrwurn wav be in terms of eta** interval units or units of one. 
In the former rase wr wit) have to multiply »q, t*f, ;*» and iq C«, (C»J* t 
(W)* and ((?*?• resprciivelv. Ci represent* the dm interval. 

"“ff* if both are r\pr***rd either in rb« interval unit* or in units 

of one, 

* f*t* Hh f **4 n should tw expressed either in class interval units or 
unit* of one. The unit used should be the tame. 



SUWNBSS, MOMENTS AND KUBTOS18 271 

Since is slwiys acero for every distribution (a,«* * *0) it 

J* 

it useless as a test of skewness. is preferable to any higher 
moment as it is easier to calculate and alto because the higher 
the moment the more will it vary from sample to sample. 
Thus, 

the measure of skewness a*** * 9 . The positive and 

negative sign of a s will have the same significance as the sign 
of (mean —mode) has. 

Another measure of skewness is obtained by the following 
formula : 

v'&Cftrt 3) 

2(5/3,- 6/9, -9) 

whe re and ^ x 4 

Hi is measure will be positive H the mean exceeds, and 
negative if the mean falls short of the mode. 

Ksirtoai* 

Suv far we have characterised a frequency distribution by its 
central tendency, variability, and the extent of asymmetry. 
There remains but one more common type of attribute of 
frequency distribution. Look at the fig. I2.fi, in which arc 
drawn three symmetrical curves A, B and C. 







272 aw mnionucnoN to statistical methods 

'JThe three curves differ widely in regard to convexity, an 
at tribute to which Karl Pearson referred as ‘Kurtotis*. The 
measure of kurtosis exhibits the extent to which the cum is 
more peaked or more flat-topped than the normal curve. A 
curve to be called a normal curve must have its convexity as 
shown in curve B [in addition to two other requisites, viz., (i) 
Unimodal, (ii) Symmetrical] When the curve of a distribution 
is relatively Hatter than the normal curve, it is said to have 
kurtosij. When the curve or polygon is relatively more peaked, 
tt is said to lack kortosis. 

Karl Pearson gave the name ‘Mesokurtic* to a normal curve 
oi a skewed curve that has the same degree of convexity as 
the norma) curve. In fig. 12.8 curve B is a mesokurtic curve. 
If some of the cases about one standard deviation from the mean 
move in towards the centre and others move out towards the 
tails, thus making the curve unusually peaked, we say that 
the result would be a ‘Leptoktntic’ curve, as curve A. If, on 
the other hand, some of the cases around the mode move out 
a little towards each half of the curve thus making the curve 
unusually flat-topped, we say that the result would be a 
'Platykurlk' curve, as curve G. 

To quote Walker, “The terms platykurtic, leptokurtic, 
rnesokm tic, ate not particularly important bui they arc rather 
interesting and roll pleasantly under one's tongue ” A;' 

amusing sentence written by 'Student' (A British Statistician) 
was quoted by him which reads “the plankurtic curves, like 
the platv pus, are squat with short tails, while leptokurtic curves 
are high with long tails like the Kangaroo which Mfps\ M 

The kortosis is measured In : 



In n not nval curve ti t will 1 »e equal to three. If greater than 
three, the curve h more pr-aked. if less than three, the curve is 
Hatter at tlie top than normal. 

The above formula may be rewritten as : 

K «, T “j :* 



SKEWNESS, MOMENTS A NO KOETOSVS 


273 


If K is positive, it means that the number of case* near 
the mean is greater than in normal distribution. If K is nega¬ 
tive, the curve is more flat-topped than the corresponding 
normal curve. 

The measures of skewness and kurtosis are expressed in 
terms nf the Greek letter Gamma. 


r, \% 






y x and y, are the measures of skewness and kurtosis respec¬ 
tively. If y, is more than zero, then the conclusion will be the 
presence of positive skewness ; if % is less than zero, it will 
mean negative skewness, and in case y t is zero, then there will 
he absence of skewness. 

Similarly if a curve is Platykurtir, y t will hr positive, » e., 
more than zero ; if Leptokurtic, y t will he negative ; and in 
case Mes ‘kuMi<, y t will he exactly zero. 

Sheppard's Correction for Grouping Error* 


Computations of inrun, standard deviation, etc., from the 
grouped data of a frequency distribution are based upon the 
assumption that all the values in a class immal are concentra¬ 
ted at the centre of that interval, In mmnodai (or single 
peaked symmetrical distributions this assumption results in ;t 
systematic error in the calculation of the even moments. For 
the odd moments, the sum total of the errors on account of the 
above presumption is zero in such symmetrical distributions, 
because of the neutralizing effect of the rtrors as these appear 
will* positive and negative signs. 

Such errors may with advantage be corrected by the appli¬ 
cation uf the formula propounded by W. F. Sheppard. The 
correction formulae for 2nd and 4th moments are ; 

IS 



274 AN INTRODUCTION TO STATISTICAI* METHODS 

»* 

(Covrffled}«>Uncorrccted w t — j£ 
w 4 (Corrected)»Una>rrccted »V~ ^ i* n t (Uncorrected) 

4 m~ •* 

(where i is the dan interval) 

The use of this correction be restricted to : 

(Tj grouped data, i.e., series must not. be discrete but 
continuous; 

(2) symmetrical and moderately skewed distribution, i e., 
the distribution tapers off to zero in both directions ; 

(3) . total frequent: v is sufficiently large, $av LOW); and 

(4) class interval is more than f| ^th of the range. 


Illmi ration 


Finn* the following frequency 
four moments, and ji y 

distribution calculate the first 

Class ( 

IU-U 

1 

15 dO 

4 

20.21 

ft 

2 5- - 20 

10 

30-3 V 

35 

15 ■ ■ 

20 

Sit.44 

7 

45--4'* 

5 

50 - 54 

1 




SKzvmem, nomiitti and icnom 275 


: 

Let the »rbitr»ry origin A *32 


Clua 

mid¬ 

points 

/ 


/*' 

/*'* /*'* 

A ' 4 

Cumula¬ 

tive 

/ 

10-14 

12 

1 

—4 

—4 

16 -64 

256 

1 

15—19 

17 

4 

-3 

—12 

36 -108 

324 

5 

20—24 

22 

8 

-2 

—16 

32 -64 

128 

13 

25-29 

27 

19 

-1 

-19 

19 “19 

19 

32 

30—34 

32 

35 

0 

-51 

0 -255 

0 

57 

33—39 

37 

20 

1 

20 

20 20 

20 

87 

40—44 

42 

7 

2 

14 

28 56 

112 

94 

43—49 

47 

5 

3 

15 

45 135 

405 

99 

50—54 

52 

I 

4 

4 

16 64 

256 

100 



100 


-+53 

212 -+275 

1,520 






' Yfx' 

.’iy* T 

"W* 






its' 

cat rws 

X3U 






-f 2 

+ 212 +20 +1,520 



£f*x{c. 

«•) _ 

2x5 

><> •. 



Vj 

£/ 


100 

100 ri 1 




_ £/*'•*(,+)* 21 2x3* 212x25 

• if ' 160 100 53 

£/*'•* (r.i) 1 20x5* 20 ; 125 

v,*- gjr • l00 “ |0(, 1 25 

Lfx' x U •)* 1,520x5* 1,520x25 


r/ 


100 


9,300 


«i =-• > i - v,~0 

«,« V,-V,» = 53- { I)‘«5j- 01 - 52-99 
«*«'*, - 3v 1 v 1 + 2v,»^25-3x53x (’I. + 2(1}' 
- 25-15-9 4 002 - 9 102 

“l“ V, 1v,V, + 6y,*.3vj* 

— 9,300 - 4 x 25 x /• I ) 

* 6>.53x {T>*—3(1)* 

-9,500- 10 + 3 18 - 0003 
*9,493-1797 or 9,493 18 








276 AW IWT»oni?CTIOW TO statistical methods 


Corscmd* *, ' •*« /where A is the class interval! 

1 1 


•52-99 -** ~ 52-99 - 2-083*-50-907 or~50 9I 
Correct r<l ** ,« m # 9" 102 ■ 9 ■} 


Corrected ^, 


7 A 4 

2 ' 240 


25 7 >• -V* 4 

9,493 ! 797 >• W2 99 + ■ ^ : 

9,493* 1797.662*3754 I« 2292 

fl,ft49'0339 <>r «,«49 03 


M * (9- 1 

lWM.27 

~T/~ 1;5ih«T; :i * 

a. 8.849 03 


3 414 


r 4 -- v^V : V ’OWG27 *025 
>y-/2 a - 3 3 114 - 3 ■ 414 

///f/jrifpfinji : 

The first three moments about the value 2 are I, 16 and 
40, Compute the moment* about the mean and the origin. 


Soiutiv *1 : 


Afvmrttfi ifbtwt th Mtun : 


(Oven 


Zf-.x- 2} , ::t x 2/* 


V 1 ' A^‘*‘ " 

16 

r » o 1 ; >* 

16 

v# ^ 2 a 

Sim i Ur1 v, y-‘ - - 

40 

or v ■ 

40 


* Inhibit rasr tortimioivi. for apph-c&Hiluy d Sheppard'* corwiion are not 
«alnf»r«l-, Uw? itiey have hr*n applied to- ilnnv }m.c<r »4 tlmpi-oyetS in 
their appheatimn. 



SKEWNESS, MOMENTS AND KDBTOSIS 


277 


U, «... v 4 — V, «*■ 1 — 1 » f) 

** f «s V| — ( V,} # aa |6 — ( 1)* 

-16 1- 15 
3v t v t + 2(v 1 }* 

«■-.40 3xl6x l+2(U a 

« — 40 48* 2^-.86 

Moments about the origin : 

Given— 

r/f*.2) 

.":v. iv: 1 

r/f.V - 2)* ir 


SfiX-2)' 



—.~P. . . w 



Let v/ 

represent 1st 

moment 

about origin (i-e. t zero 

V 

M 2nd 

»» 

M »• »» 

V 

3rd 

•> 

*» M *» 

Now v/ 

27 i*. 0) 

**.;v. or 

£f(X) 

V or 

27 .V 

,V 


r/(.v o;* 

r/Y-v; 1 

or IJX* 

v » 

A' or 

.V 

or y 

V 

Xft'X 0)> 

“ - or 

jsyc-v>o 

A" 

z/x* 
or v. 


£[(X—2) 

or ,“£L 

2 Xf 

’ 

—*»■* -gij. | 

or A 

y | 

or 

4s 

! 

N3 

8 



. 

r/A , 

V, -14 2- 

3 



or v'jmS 





278 AN INTRODUCTION TO STATISTICAL METHODS 


.. Sf{X~ W „ £/(X*~ 4JT+4) 

V or .. . y . ■* lb 

w mi 

w m n + .v ~ b 


- 4 (3) t 4 16 

r/** lrt , 

~~ 12 + 4 -• lb 

iyx* 

or J N - « 16 4 12—4 « 24 


or 


or 


or *, 24 

X/UV*2)» 


- 40 or 


£/(A'*~GA a + 12JT-8) 


-40 


r/X» hEJX 1 \2ZJX 


or ■ y - ' v t ~y-8> 


,v 


40 


r/v* 

or y b (24 >4-12 (3)-8 -40 


EfX* 

or y 144 r 36 8 «' — 40 
A 


v* vi 

v . 40+144 ■■■36 + 8 **76 

or vj' 76 

Thus the hi three moments about the mean stand at the 
figures of 0, 15 and — 86 respectively; and about the origin 
are 3, 24 and 76 respectively. 

It may be noted that as the 1st moment about the origin is 
always equal to the value of the mean, so the mean value of 
the above distribution is 3. 





8&EWNES*, MOMENTS 4NI) RtBTOShS 


279 


NoU : 

The moments about the origin tan also be < a Rotated from 
the data of the moments about the mean, viz., 


Efx ZfiX-X) 

*' -V . "X 

As Mean or A' V|, where A is the assumed average 

-24-1-3 

We know n x --0 


Similarly, 

r/i.v- 3) 4 

*V~"" ■ y 


15 


I'J'iX - 3) 4 


86 


Ami from these the values of or 


1/X , SfX* 

--jy" ; or - - ■'y-..; 



and v,' or 


can rasil) be calculated, 




280 AN INTRODUCTION TO STATISTICAL METHODS 


EXERCISES 

V- What is skewness ? How would you find it in a non»iymnieirical 
dutt ibut ion ? 

2 . D jtingmth between : 

(a) Dispersion ami skewness 
(bi Posit ive and oegat rvf tkrw ness 

(e) Qjiarute deviation methyl of measuring skewness and Pearson's 
measure of skew no*, 

3. Kiplsin the three terms : dispersion, skewness and kur tons. 

4. Give a suitable formula for measuring (he prakedness kuttotis; of a 
frequency distribution. 

5. Find Bow ley's coefficient of skewness for Q^s. 27 and 28 of central 

tendency, 13.221 

6 Find Karl Pearson's coefficient of skewness from the following data ; 
Marks above 0 10 20 30 30 50 60 70 80 

No. or Students 150 140 100 00 80 70 30 14 0 

[M A , Ret},. f i 291 

7. Compute quartile coefficient of dispersion and skewness of the following 

data ; 

Central Sire 12 3 4 5 b 7 ft 9 10 

Vtequency 2 9 It 14 20 24 20 U, 5 2 

\B-Com , A\ jra, 1957) 13.26J 

8 Fmd mean, median, standard deviation and a coefficient of skewness 
from the following; data of age of the students of a school : 

Age 5—7 8 10 II—13 4-10 17-19 

No. of students 7 12 19 10 2 

13/27] 

9. find the coefficient ol skewness lor the following distribution : 

Variable 0- 5- 10— 15— 20- 25- 30— 35— 40 

Frequency 2 ;> 7 13 21 10 8 3 

B Com.. Delhi , 1956 ) [3.28] 

10 a Kail Pearson '*vcictTu-iem of skewness of a distribution is 0*32. 
Its standard deviation is 0 3 and mean is 29*6. Find the mode and 
median¬ 
’ll II the mode of the above distribution is 24 8, what will be the 
standard devu'ion * 

'}'■'> It tor a distribution —'30 is Bow ley's coefficient of skewness, 
Q,, . 0 h and Me. >*12 3, find the quartile coefficient of dispersion. 

(3.41] 

M. Computr the first four moments about an arbitrary origin from the 
following data of heights in inches of adult Irishmen 

Height* 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 

Adulta i G 2 2 7 15 33 58 73 62 40 25 15 10 3 

{M*A. t Panjek, 194 a) (10.1) 



12. 


n 


M, 


!>, 


P. 


18. 


19 . 


20 . 


21 


SKEWNESS, MOMENTS AND KURTOSIS 281 


Compute the first four moment* About arithmetic mean from (hr 


following data : 

Variable value 5 

10 

15 

20 25 30 

35 

Frequency 8 

15 

20 

32 23 17 

5 

Cak ulatr the film four 

moment* 

U0.3I 

about the mean for the following data: 

x Mid point* of Class Intrival) 

1 

2 3 4 5 6 7 

8 9 

J frequency) 


1 

6 13 25 30 2? 9 

5 2 


i&1.A , Mifarh, ti)t*} ( 10 . 4 } 


Following figures relate to the mark* obtained by 20 students of a cliff 
in a test. Find the hr»t four moment* about assumed mean 10. first 
four moment* About the arithmetic mean and the value of 0 y and $*. 

1. 9, 12, 2. 10. 15, 3. 16, 19, 5, )8, 17, t\ 21), 0, 

h, :>, n, 12. (10.81 

The lint four moments of a distribution about the value 4 art-1*5, 
17, --30, 108, Calculate the moment about the mean. 

i\i A $ Dtlhx , i 9 .k6 > (10-2] 


Calculate the moment* lor the following : 

Mark* ; l(CIO, 40-40, 40-50, 50-60, M)-70, 70410, 80-90, 90-UK) 
Firqurncy : h 28 % 75 5b 30 8 l 

i M.4., Morfrai, t<# 45 i \\0,5( 


(Ul ulate firm four moment*. a y and « a after applying Sheppard'* 
< on ret ion : 

Wage* in K*. : JO 20— 30- 40- 50. 60- 70- 80 

\u. of Person*: 15 23 35 19 32 28 12 0 

; HM>J 

Find out the mean, mean deviation and kurtusis of the following data : 
Claw Interval ; 0-10 10-20 20-30 30-40 

Frequency : 14 4 2 

: M , Com ., A jf ra, f 9.57/ (10 7 j 


Calculate coefficient of skrwnest bv all the method* known for (hr 
following data on age of 250 person* in a sample study : 

Age* below : 10 20 30 40 50 60 70 80 90 

No- of Persons: 15 35 60 «4 % 127 188 200 250 

110.9] 


For the following four *crie* hud out the mean standard deviation, 
skewness and kurtosii and comment, on the result* : 


(-las* Interval : 
Freq series A 
.. - B 

*1 5» C 

M l> 


0-5 

5-10 

10-15 

15*20 

20-25 

25-30 

0 

12 

24 

28 

24 

12 

4 

4 

20 

44 

20 

4 

0 

4 

40 

24 

20 

8 

4 

8 

20 

24 

40 

4 


30-35 

0 

4 

4 

0 

110/12) 


For a certain drstirbutton of 25 items, a calculation w»eaim, 

standard deviation, 0 h were respectively 54*23", 1 79", 0*053 , and 
8*16', It was found at the time of checking that iterw* t>3 9 and 
48-8' were v»tongly written instead of correct item* 50'4 and 
Calculate the correct constants. llv.101 



282 AN INTRODUCTION TO STATISTICAL METHODS 

2T X, .{** and p* were found for 100 items a* 50, 12, 4 and 2H»G0Q 
itipectWy. If si the time of calculation* 'rf> and 49 were wrongly 
written as 50 and 40, find the corrtcl oomuati- [10.II] 

2 %, Compile # table avowing the frequencir** with which words of different 
number of letter* occur in (be extract reproduced below omitting 
punctuation mark*} treating a* the variable the number of letter* in 
each word and calculate the value* of v assuming ? at the arbitrary 
mean) and ft for the Ur four momenta, 

Sucre** in the examination confer* no absolute right to appointmciti, 
unless Government ii satisfied, after inch enquiry at mav far considered 
necessary, that the candidate i* suitable in all respect* for appointment 
to the public let vice, 

24. For a certain group of workers ’hr median and quartilt earning* per 
week are R* X* t 52 and 5ft reaper.tively. The earning* for the group 
range between Ri. 4b and Rs, ti>\ Ten per cent of the group earn 
under R*. 49 per week, III per cent ear n Rv fil and over, and <3 
per cent R«. *34 and over. 

Taking R* % 5 a* the assumed average earning, calculate the value* 
of 0 f and 0 f and throw light on the pretence or otherwise of the 
ikewnet* and kurtoti* in the above frequency distribution 



Chapter 13 

Probability 


I n everyday conversation, chance, likelihood and probability 
are used as synonyms. We may come across people saying— 
the chance of winning the crikct match by the home team 
against the Aussie* is low ; the likelihood that there is life on 
moon is not very high ; and the probability of getting a head 
when a coin is tossed is one-half 

In statistics, however, we give the term probability a specific 
meaning which may not exactly be the same as the one given 
in every day conversation. Here we treat (the principles of) 
the theory of probability as a branch of mathematics. In fact 
the theory of probability is a common and inseparable link 
between mathematics and statistics, as the development of the 
theory of probability is in the domain of mathematics, but the 
application of this theory takes place in statistics (in the form 
of random sampling, etc.) 

According to Laplace, probability is the ratio of the number 
of ‘favourable* cases to the total number of equally likely 
cases. Stated in other words, 4T ff *hcrc are several equally 
iTEeTy events that may happen, the probability that any one of 
these events will happen, is the ratio of the number of cases 
favourable to its happening to the total number of possible 
exclusive and exhaustive cases.'* For example, if a coin is 
tossed, there are two equally likely exclusive and exhaustive 
results, a head or a tail ; hence the probability of a head is } 
and similarly the probability of a tail is 

We may state probability as the proportion of two frequen¬ 
cies of occurrence: 





284 AN INTRODUCTION TO STATISTICAL METHODS 


where p it proportion or ratio or probability of the happening 
of an event, s it the number of equally likely exclusive and 
exhaustive event* happening (i.e., successful or favourable 
event*, or simply 'succr**'), and f is the number of equally 
likely exclusive and exhaustive event* failing to happen (i.c„ 
unsuccessful or unfavourable events ot simply ‘failures'). 

Let q stand for the proportion of an event failing to happen, 
then the probability of the non-happening of an event will lie 


Since an event can either occur or fail to occur, the sum of 
the probability of the occurrence (happening) and non-occui- 
rence (non*happening) is exactly l, or p + q** [. Thus if either 
of these ( p and q) is known, the other can easily be found out, 
via , p ** l q and similarly q »1 -p. 

From the above statement it is amply dear that in no case, 
probability can exceed the limit of l Similarly the other limit 
of it is zero. Thus it varies from zero to one. If p Q. then 
q ™ \ and vie* versa. In fact when the figure is cither exactly l, 
or exactly zero, strictly speaking, probability loses its existence 
and is completely replaced by certainty. 

We may now state 


where A’ is the sum of s and/ or the total number of events. 

Now, to find the value of p, one should only know the 
number of (equally likely) successful events (j) and the total 
number of (equally likely exclusive and exhaustive) events (.V); 
and for the value of f, the number of unsuccessful (equally 
likely) events (/) and the total number of equally Likely) 
event* (,V). 

Thus probability may be defined as the total number of 
cates favourable to the event, divided i»v the total number of 
postible case*, provided that the bitter are equally likely exctu* 



PROBABILITY 


285 


live and exhaustive. This statement indicates the procedure 
to be employed for finding out the probability or chance in 
any particular problem- The procedure is : 

(»} Count all the possible and equally likely cases. In the 
case of a die the number is six. In the case of a coin 
the number is two. 

fii Next, count all those cases which are favourable to 
thr event in question. 

'hi) Then, the probability of thr event is the fraction whose 
numerator is the number obtained by step (iF and 
whose denominator is the number obtained by step (i). 

HtuUtalwnx : 

fa) What is the probability of an ace in the throw of a 
die ? First, all the possible cases with one die will be six, the 
number of its faces. Second, the number of favourable cases i> 
one because only one side can tome upward which is the same 
as the event in question. Hence the probability we are looking 
for is the fraction whose numerator is one and whose denomi¬ 
nator is 0 ■ it i* V, f 

(b) What is the chance of drawing a King in a drnw from 
a pack of 52 cards ? 

First total number of cases total number of cards 52. 

Second total no of favourable cases* total no. of Kings in 

the pack of cards* * 4. 

Thr probability * w or *. u . 

If a bag contains 4 white balls and 4 black balls, gd) 
alike except for colour, what is the chance of draw ing at 
random one white ball ? 

First total no of eases*.'-Total no of balls fl. 

Second total no. of favourable cases Total no. of 

white halls 4, 

The probability *1 Ay V t . 

(d) What is the probability of a King in a pinochle deck ? 
{A pinochle deck consists of 2 Aces, 2 Kings,2 Queens, 2 Jacks, 



286 an vmoovcnon to statistical methods 


2 ten* and 2 nines of each suit. There are no cards of low 
value). 

First total no, of cates**Total no, of cards** 48 

Second total no. of favourable cases** total no, of Kings 

in the deck » 8 

/, The probability•/n 1 / f . 

Each probability that can be expressed numerically, takes 
the shape of a fraction. The value of the fraction can never 
exceed one, i.r., the probability can never be more than one. 
A probability such as say *j , can never happen. If the 
probability of an event is, say, I, it means that all equally 
likely cases are favourable, i.e,, the event in question is abso¬ 
lutely certain to happen. On the other hand, if the probability 
of an event is, say, 0, it means that none of the equally likely 
cases is favourable, i.e., the event is absolutely .impossible to 
happen. The examples of the probability being equal to 
unity are : (») That 1 shall some day die, (ii) That either 
head or tail will appear »n tossing a coin. The examples of 
the probability being equal to zero are * (i) The probability of 
getting 5 kings j f > an ordinary pack of 52 cards. 

The probability expressed in fraction can be translated into 
percentage form. The probability of drawing a 'heart’ from 
an ordinary pack is ,s / |#> i.e., 1 4 or we might say the probability 
of drawing a heart is 25%. 

Equally Likely Event* 

It has been stated earlier that probability is the ratio of the 
number of 'favourable cases' to the total number of 'equally 
likely cases’. Now one may ask "What do we mean by 'equally 
likely cases’ ?” "Is not * equally likely* merely another way of 
saying ‘equally probable* ?” *Ts this definition of probability 
not a circular one, since it defines probability in terms of itself?” 
"is this not the same thing as defining a horse as a horse-like 
animal ?*' In fact equally likely cases are those which lead us 
to think that any of them may happen as often as any other. 
In one throw of a die, we may get either spot one, or spot 
two, or spot three, or spot four, or spot five or spot six. Thus 



287 


momMlbtYY 

any of them may turn up and hence they arc equally likely ; 
for this the die must be symmetrical in construction. If a bag 
contains, say, eight baits and all of them are in different colours, 
then in a draw of a bail, they will all be equally likely, pro¬ 
vided the balls are identical in construction and so well mixed 
that any one of them can be selected. 

The illustration below will clearly indicate the meaning of 
Equally likely’ : 

A college has five types of courses • M.A., M.Gotn*, B.A,, 
B.Com., and B A. (Hons). What is the chance that a student 
picked at random from the college belongs to M A ? 

One may argue : There are five courses Out of these one 
is favourable to M.A. Hence the chance rhat the student 
picked up at random will hr from M A. course - J. But this 
result may hr correct or may not be correct* Here we are con¬ 
cerned with the students studying in various courses and in the 
College. Just possible the number of students studying in vari¬ 
ous courses may not be the same. The chance of fetation of a 
course in the present case is directly proportional to the number 
of students in that. If the numbers of students in various courses 
are equal calculation made above is correct. But if they' are 
not equal ^whirh is more likely), all the courses will not be 
equally likely. Then the correct value of the required pro¬ 
bability will be number of students studying for M.A. divided 
by total number of student* in the College. 

Let m now take a different problem from the same situation 
What is the chance that the student or students standing first 
in M A. get the highest percentage of marks amongst all the 
courses ? In this case each course is represented by one per¬ 
centage figure obtained by the student or student* who stand 
first in that course. There are five courses or five such per• 
ccntage figures The number of cases favourable to the student 
from M.A. course getting the highest percentage of marks is 
one** So the required probability will be 4- Here all the 

* We arc flLMummq that oiher unknown factors like standard *f mat king, 
suitability of question paper etc. egret uniformly all tbf course*. With¬ 
out this assumption the problem cannot l>e solved. 



288 aw mrnomcnon to statistical methods 

course* are equally likelv, Hence the argument given m the 
beginning holds good in this rase 

Permutations and Combination* 

Probability problem* usually involve the application of 
elementary algebra of permutations and combinations. It is 
intended to discus* below some of the important ruin of per- 
imitations and combinations 

Permutations 

An arwtngement of a finite number of distinguishable things* 
objects or elements in a specified order is called a permutation. 
Thus the word man is a special arrangement of three let ten 
m, a and n, Other possible arrangements of these three letter* 
are mm, nm<t, nam, amn, mna—t.r,, in all f> possible arrange¬ 
ment.* ; ami these arrangement* are called permutation*. 

For one object or element, there is only 1 order ; with two 
objects or elements, there are 2 possible arrangements : 1, 2 
and 2,1. For three object* or elements, there are 6 possible 
arrangements nr orders, via., I, 2, 3; 2, 3, 1; 3, 1,2; 3, 2,1 ; 2,1 
3; and 1.3, 2 Similarly 24 possible arrangement* ran be 
obtained by writing down all the possible orders of the number* 
1,2, 3, 4. 

In general, if there are tt different things or elements, it is 
possible to form rd (lead K n factorial*, i r., n (n— 1; in --2'. 1) 
different arrangement* or fwrmutations Thus it there are 3 
letters then the total no. of permutations will be a! i.e . 3!, viz., 
3 >: 2>: l b. In this case ;n making different arrangement*, it 
is |>owble to pirk the 1st letter in 3 (in general n) different 
ways. After picking 1st letter, there are then left two 
(in general n \ ; different wavs for the selection of the 2nd 
h*t*er. Finally, there is left cuds one (in general n • 2', and 
there h only one way for the selection of the last letter.* 

Now each one of the two \va\$ of the selection of the 2nd 
letter <au be combined with each of the three way* possible for 

* hr dm ease * it 1 Hat * nu> t* number, and in that rate 

\till new he the final «PI\ 



fK0ftA*tI4TY 


289 

the select ion of the 1st letter, so that there are 3x2 various 
ways of picking the first two letters in this cate. Since that is 
only one way left {in every case) for the selection of the last 
(3rd in this case} letter, which when combined with else first 
two letters, will give 3x 2x1 «*6 different wavs of picking all 
the three letters. Thus the number of different arrangements 
or permutations for 3 things is 3! or n\ when the number of 
times is a. If there would have been 5 different letters, total 
number of permutations would be ri* where n would be f», i e, f 
5 x 4 x 3 x 2 >■: 1 ** 120, and so on, 

tllu firm urn : 

There are six seats available in a cornpartmriiT, fn how 
many wavs can six persons be seated 7 

Solution : 

Six persons can be seated in six seats in 0! wavs, 
or (i! s* 6 >' r ) 4 x 3 / 2 I •-••••. 72** w as s 

llhatration ; 

In how many different ways can 10 different books be 
arranged in a shelf ? 

Solution ■ 

n different books can be arranged in n { ways or in 10! ways; 
- 10 0 x. 8 < 7 >: ft y 5 4 , 3 2 * 1 36/28,800. 

Number of different permutations of n things taken r at 
a time 

Sup|K>*ing that among 1ft dissimilar or different things 4 
are to be selected for sontr particular purpose* Then the 
problem is : Jtt how many different ways can a sub-group 
of 4 lie selected from the total nf lft, i e , permutations of 10 
things taken 4 at a time ? The solution is as follows : 

The possible selec tion of the first of die sub-gr oup of 4 is in 
1ft {or w) different ways ; the sec ond in 9 (or n 1) different 
ways, the third in 8 for n ■ 2 } different ways and tlte fourth in 

7 (or n . '%). which can also be expressed as n r r 1* where r j • 4 

ami as 4{r) things me to he arranged out of 10 (*< 

19 



290 AN INTftOWCTlON TO STATISTIC At* METHODS 

{things} different ways, There are thus 10x9x8x7, i.e,, 
n ->*-1) (n -2) (n !i)«5 f 040 different permutations of 10 
different thing* taken 4 (or r) things; at a time. 

The general formula for the number of ways of selecting r 
object* out of n different objects, and arranging them in order, is 

«■ 

'P* -••'»(« - 1 ) • n 2 ) ..(« r r 1 } . : , r ■ 

f* r)f 

ft/jotrafian : 

hi how mat^ different order?- 3 books at a time can be 
arranged out of 5 different books on a shelf ? 

Solution ; 

Applying the formula */b ™ h P $ 

V 

’ (i~syr 

5x4x3 x 2 x1 
5 x 1 

■*- 60 

Number ef permutation* of n things, of which n 5 are all 
alike and of one kind, n, all alike and of another kind 
and *o on, it * 

, where n n, ‘ n, » . 

it is evident from the above formula that the number of 
permutations will not lie equal to ub The reason is obvious : 
n\ will give the number of specified arraogcmrim w hen n things 
are alt different, but in case some things are exactly alike and 
the others are ail different, the total number of permutations 
will lie less than »!- 
If lustration : 

In bow many way* can the letters of the won! Statistics* 
be arranged in specified order* " 

Solution : 

Here Y occur* thrice, m also V, V occurs twice ; and the 




PROBABILITY 


291 


total number of nil the letters is 10. So 3, #,=**3 and a, **2. 
And n«* 10 being tlie sum of »* and « a . 

Applying the necessary formula. 

_ n\ 

»j! a,!*,! 

10 ! 

3! 3! 2! 

10 x 9 y. 8 x 7x6x5 x 4 y 3 > 2 * i 
w JyTxV'xfyi vTTSxT. 

r*50,4U0 

Ilfottralwn : 

In how many ways can 12 books be assigned to 4 shelves, 
containing 5 r 4, 2 and I books respectively ? 

Solution : 

Let n l*e the sum of a,, n r n 3 and n 4 representing the shelves 
containing 5, 4, 2 and 1 books respectively. 

Applying the formula, 

»! 

V «*1 

12 ! 

51 4! 21 1! 

12 x 11 x 10 >: 9.x 8 x 7.^,6 

TxT x 2 x aT“ 

**83,160 

In case each of these 4 shelves were to contain equal number 
of books, i.c,» 3 each, then the above formula will be changed to: 

w! __n!_ 

IF#,! »,! »,! «,! 1? *,! »,!*,! », r 

Since * t s, * a •-* a 4 

ft! a,J «|! a,! a,! 

Where ft i* the number of shelves containing equa l mimlier 
of books. 




292 AjS introduction to statistical methods 


Substituting the figures in this amended formula (which is 
used for equal «ub*division) 

3/! 121 

iwHw ** 4J3?^?3?3! 

12 X M / 10 x 9 / 8 / 7x6 >: 5 

1x2x3/2/312/3x2 
-15,400 

Cambiutloat 

A combination it not the tame thing as a permutation. 
A group of, nay, 3 If iters, constitute* a combination of these 
;i if Ur 15 , hut this combination can be arranged in 31 ways, 
which we have already seen in the discussion of permutations. 
In other words* it Is possible to have 3! [or ml) permutations of 
a single combination of 3 for «) things. 

Although a group of n things forms but a single combina¬ 
tion, rub-groups can be picked in such a manner as to constitute 
different combinations, For instance, to pick a committee of 
3 out of 10 persons, i.ethe different ways in which such a 
committee can be burned, is a problem not of permutation* but 
nl combination as here wr warn to know in how many ways wc 
can pick out a number of persons (or things) front a group 
without curiru: in what order they arc arranged, "thus by a 
combination is meant am one of the possible arrangements 
of n different tilings, objects or elements without regard to 
mder. 

Number of combination* of n dissimilar things taken r 
at a time t 

The number of ways of picking out r things from n dis¬ 
similar things is denoted by 

1) fff-2) ...... (fl-r-i 1) 

' r (r 17 ‘{i-2) .7 I 


n fa ■ ■■ I) (a —2) .. fa - r / 1 j 






PROBABILITY 


293 


Illustration ; 

In how many different way* can 4 committee of two be 
selected from 25 persons ? 

Solution : 

Applying the formula 

9 V 

» Mf* . .. 

{,r - 2 ! (25 2 >! 


25! 25x24 

-■ -'2'"2i\ '' ’ iTT 

3on 

Illustration ■ 

(ih is proposed to draw a random sample of 3 towns 
from a total of 15 towns, the latter being divided into two 
strata of 10 and 5 towns. If a sample of two is drawn from the 
former and a sample of 1 from the latter bow many different 
samples of 3 are possible } 

(ii) How many samples of 3 are possible if no stratification 
is applied ? 

Solution 

Whin no thatiJibuti on : 

Total number of 3 samples from a total of 15 towns will 
be given by : 

15! 15! 

si (15—3)1 '““‘srisr 

15x14x13 

3x2 

- 455 

When strati fixation u appl ltd : 

A sample of 2 from 10 towns ii given by *C r **C t . 
Similarly a sample of 1 from 5 towns U given by n C r % C y 
So the total number of different samples of 3 will be 

“Q x *C, - JOJt® X y * 45 x 5 

*225 




294 AN INTRODUCTION TO STATISTICAL METHODS 
KthtimUf bt lw «» ctwUaaiiMi a ad ptnaMttim 

We know *C,»= . . , , and 

r! (it-r)! 



**/» 

«S# ^.j’""' i and 

n p r *» m C, Xf! 

It it obvious from the above explanations that the com¬ 
binations refer to different sub-groups whereas permutations* 
in additio n, refer to the possible ar rangements in such different 
sub«git»uf ■». 

Prsblnas mm Probability using tbe concept of Combinn* 
tisnt osm! Ftnastitisns 

(t) What is the probability of getting 3 white balls in a 
draw of 3 balls from a box containing 5 white and 4 red balls ? 

Solution : 

The number of favourable cases to the selection of 3 
white balls is the same as the number of ways of getting 
3 ball* out of the 5, i.c., *C,. 

Similarly the number of all possible exclusive and equally 
likely cases is the same as the number of ways of selecting any 
three balls (white oi red) out of total of 5 4 4 or 9 balls in 
all, i e. f *C 3 . 

.•.The required probability « *C*/*C» ^ ‘ C 5v*0|“4S 

(ii) In a game 01 image what w ihe chance that a specified 
player gets all the four Kings ? 

Solution : 

A player gets 13 cards in the game. The total number of 
way# in which a particular person can get any of the 13 cards 



PROBABILITY 


295 

The fa'vourable eases to the given event are those 
where the player gets out of 4 Kings all the four and an> 9 out 
of the letnaining cards, 

4 Kings can be obtained in l C i wars and am 9 out of the 
remaining 48 cards can be obtained in "C» »■»>*. 

/. The number of favourable cases arc *C 4 * 0 C t 
The required probability ** 4 C, x U C,;**C I§ . 

(iii) In an urn there are 5 white and 4 black balls. What 
is the probability of drawing the hist ball white, the second 
black, the third white, the fourth black, and so on, if they are 
drawn one at a time ? 

Solution : 

The problem is similar to the one relating to the arrange, 
rnent of 9 balls (5 white-} 4 black) in 9 places, which can be 
arranged jn 9! ways. 

5 white balls are to occupy 5 odd places, via., I, 3, 5, 7 and 
9 which can be arranged in 5! ways, and the 4 black balls arc 
to occupy 4 even places, vie , 2, 4, 6 and 8 which can be 
arranged in 4! ways. 

V. 4! 

So the required probability is ; 

4X3X2 1 

9x8x7x6 *** 126 

Simple and Compound Event a 

When two or more events occur simultaneously, their joint 
occurrence is known as compound event. Where as in a simple 
event one event at a time occurs. For instance if two com* 
are tossed together, wr may at a time get two heads, one head 
and one tail, one tail ami one head or two tails (he., no heads) 
tn other words, different combinations may take place between 
the sides of the first coin and the sides of the second coin, 
Similarly if two dice are thrown simultaneously, any number of 
spots from 1 to 6 on the first die can be combined with any 



296 AW INTRODUCTION TO STATISTICAL METHODS 


number from I to 6 on the second die. In all, 36 different 
combination* may be formed, which will be as follows : 



2ndj 1st 2nd' I ft 
die; die die dir 

..i.ti i i r 

2 | 5 2 I 6 

3(5 3 | 6 

4 ; 5 4 6 

5 ! 5 5 6 

6)5 6 6 


2nd 

die 

P 

2 

3 

4 

5 

6 


From the above tabic, one can easily find out the different 
possible number of ways in which the total of any number 
varies from 2 to 12. Thus there is only one combination, M 
(one-one) which will give a total of two and as the number of 
all the possible combination* here is 36. ilinefore, the proba¬ 
bility of this total 2 is 5 assuming that the dice are uniform 
and well-balanced so that ail sides are equally likely to appear. 
However, a total of 7 can be made up in 6 different ways : 

I 6, 2 5, 3 4 , 4 3, 5 2, and 6 l, so that the probability of a 
total of 7 is V* The probabilities of the different possible 

total* may be put below in an easy form which can quickly 
be grasped and retained in one's memory. 


Total* 

2 3 4 5 6 7 8 9 

No. of 

f 7 3 4 5 6 5 4 

possible 


cases 



The sum of the probabilities <>f all fxmible totals in exactly 

1, vist. p 

I ..2 3 4 3 6 5 4,3 2.1 
3& + 36 ■'36 +3$ +36+36 +36 + 36 f 36 +36 +36 

Illustration : 

Find the probability of throw ini* a total of 2 at least once in 

24 throws with two dice. 





PROBABILITY 


297 


Solution : 

Probability of not getting a total of 2 in one throw with two 

r . . 1 35 

So the probability of not getting the total of 2 in 24 throw* 

/ 35 w / 35 ■■** 

even once will be ( ^ j or y,( r6 ) -0509 

=1- 0 509 0 491 

Thus the probability of throwing a total of 2 (or double 
one at least oner in 24 throws with two dice is 0 491. 

Addition Theorem 


In throw of a single die, the probability of getting spot (2) 
is J The probability of say (41 is also | and this is also the 
probability of (6). Further, in a throw of a die. the probability 
of getting an even number is J, for the reason that three 
out of the six equally likcK cases are even number*. We can 
also say that probability of an even number (2 or 4 orb) is 
*/ f ~- J. But the individual probability of 2 is $, or 4 is l and 
of 6 is J. Hence, follows thr theorem that the chance of any 
one of a number of mutually exclusive events is the sum of their 
individual probabilities 

To take another illustration, the probability of a King in 
pack of 52 cards is 1 ; w , be., l ! u * Thu* the probability of a King 
or Queen in a single draw is 1 Vjj- 

A bag contains 30 balls, of which 4 arc white, 5 are black 
6 are red, 7 are green and 8 are blue- What is the probability 
of getting a white or red ball, at random in a single draw 
of one ? 

The probability of getting one white*» 4 /*# 
n a *■» n »< red — 

„ ,i „ white or red* 4 / M 4 

■* **/•«! 
or-33J% 

The rule of finding out the probability of the happening 
of mutually exclusive events may be stated as follows : 



298 aw introduction to statistical methods 

The probability of either (or any one) of two (or more) 
mutually exclusive events occurring i» equal to the sum of their 
individual probabilities If we write P (A or B) for "the pro¬ 
bability of either A or B'\ we can denote ibis rule as : 

P (A or B) ^P{A) +P(B) 

This can be extended for three or more mutually exclusive 
events as below i 

P(A or B or C . )**:P{A)+P{B} + P((:) . 

Mutually Cidasln 

We stated above that the probability of either of the two 
mutually exclusive event* (attributes) is the sum of their indivi¬ 
dual probabilities, 'Mutually exclusive events' mean that the 
occurrence of one event prevents the possibility of the other and 
m 0 ms*. In a toss of a coin the appearance of head prevents 
the appearance of tail. In a throw of a die y the appearance of 
any one side prevents the appearance of the other live sides 
Thus all the six cases are mutually exclusive. 

If the events in question are not mutually exclusive, the 
above addition theorem fails- This is shown by the illustration 
below. 

tttusitalim : 

A hag contains 20 balls matked with the first twenty 
numerals ; one is drawn at random, Find the chance that it is 
multiple of 5 or of 7. 

The chance that the number is a multipleof5is 4 / M (because 
the favourable numbers are 5* 10, 15, 20;. 

The chance that the number is a multiple of 7 is */*, 
because the favourable numbers are 7, 14). 

Thus tire chance that the number is a multiple of 5 or 7 is 
»•«*. *>%• 

But if the question had been : Find the dwmee that the 
number is a multiple of 3 or 5, it amid hw>c bun wnmg to turnm 
mffillms : 

The chance that the number is a multiple of 

$ m$ ln (favourable numbers are 3, 6, 9, 12* 15, 18) 





PROBABILITY 


299 


The chance of its being a multiple of 
5m«/* (5, 10, 15, 20) 

The chance that the number is a multiple of 

3 or 5 —*/* 4* 4 im 
«* %9 ln 

But this iniwer is wrong, because the number on a bill 
might be a multiple both of 3 and of 5; so that the two events 
arc not mutually exclusive. In the example the number 15 is 
counted twice in favourable events. The correct answer is, 
therefore, * , f# . 

Further the addition theorem can be validly applied only 
when the mutually exclusive events (attributes) belong to the 
same set. To show this point Von. Mites has given the 
following example : 

Suppose the probability of a man dying between his 40th 
and 41st birthdays is 0 011 and the probability of his marrying 
betw een his 41st and 42nd birthdays is 0 009. These events 
arc mutually exclusive hut it cannot be said that the proba¬ 
bility of a man dying in his 40th year and of marrying in hi 
41st year is 0 011 40*009 0 02. The two events do not belong 
to the same set. 

Multiplication Theorem 

Before discussing the theorem it is essential for us to know 
the meaning of the dependent (or conditional} and indepen¬ 
dent (or unconditional) events. Two events are said to be 
independent w) en they have no influence on each other. The 
remit of the Erst toss of a coin does not presumably affect the 
result of the second toss. This can be illustrated by a simple 
example. 

A hag contains 5 balls : 1 white, 1 red, 1 blue and 2 green. 
Find the chance of 

(i) drawing a green ball in the Erst draw, 

(«} « « t» *. second „ . 

In the Em draw the chance of drawing one green ball is % 



-300 AW INTmODVCriOM TO statistical methods 


(ns the favourable ca*et are two and total numbers of equally 
likely rare* are 5). If the ball drawn in the first draw is again 
mixed in the bag, the chance of getting a green ball in a second 
draw remains f (as the favourable cases arc .wo and total 
number of cases are 5). Here the two events (results of the 
first and second draw) are independent of each other, i c., the 
result of the first event does not influence the result of the 
second. 

On the other hand, bad we not remixed the ball drawn in 
the first draw, the total number of cases in the bag for the 
second draw would have been four, and suppose also that the 
ball drawn in the first draw wax green, then the chance of 
getting green bail in the second draw would be V# (as there 
remains only one green ball}. If the ball drawn in the first 
draw happened to be white, then the chance of getting green 
ball in the next draw would be *y V> Thus wc ate that the 
result of the second event (draw) is influenced bv the result of 
the first *vent (draw), hrncs, the two events (results of first ami 
second draw) are dependent and the probability of second 
event in such a case is called the conditional probability. 

The probability of a compound event, formed by two or 
more independent events, happening together, is the product of 
the probabilities of each one of them separately. In other 
words, the probability of the joint occurrence of independent 
events is the product of their individual probabilities. 

Fh turn mdtf*ndent ivetils : 

P(A and 

Fm mart (Asa two tndtptndtnl the above rule may be 

extended as : 

P(A and B and C..,)^P{A) xP{B} xP(C) x ... 

Ft>r two doptndtni tents ; 

The probability of a compound event, formed by two depen* 
dent events A and B happening together, is equal to the pro¬ 
bability of the event A multiplied by the conditional probability 
of the event B (i.e., probability calculated after A has taken 
place). 



PHOBABTLITY SOI 

The rule of comlitonal probability can be itated symbol i- 
rally as below : 

P{A and B)***P(A)y,P(B t A) where P(B, A) represent the 
conditional probability of event B, i,r.» probability of B when A 
has occurred. 

Illustrations : 

' i) What is the chance of showing two sixes in two throws 
of a die ? 

S fihtt kn : 

The chance of a six in first throw is } 

The chance of a six in second throw is l 
The chance of two sixes in two throws is | X J*«T/h 
( ii) What is the chance of getting all the heads in three 
throws of a coin ? 

Solution : 

The chance of getting head in the first throw' is f 
The chance of getting head in the second throw is $ 

The chance of getting head in the third throw is J 
Hence the chance of getting heads in all the three 

throws is | 1 x $ 

tiiil If in a population of adults, { are classified as short, 
and 1 f4 as superior in intelligence, wliat is the probability of 
selecting at random a person who is both short and superior in 
intelligence '* 

Solution : 

Herr the two varieties height and intelligence are practically 
independent of each other, i.e. t the happening of one event is 
not dependent upon the occurrence of the other, so we will 
apply the rule of the joint occurrence of tsvo independent 
events : 

P(A and B) I\ A) ' P{B) 

where A represents short sutured c liaracieristic and //, super io* 
tity in intelligence, and P the probability of these, 

Therefore 

P{A and B) ** */* < 1 1 ;j w 



302 an introduction to statistical methods 


(iv) In totting a coin find the chance of throwing head 
and tail alternately in three successive trials. 

Seiutwn : 

Here the first throw must give either head or tail; the 
chances that the second gives the opposite to first is { and the 
chances that the third throw is the same as the first is 
Hence the chance of compound event** J X J *» J 

(v) What is the chance of throwing 6 with a die at least 
once in 3 triads ? 

Solution : 

The probability of the happening of an event at least once 
can be found out by subtracting the probability of its not 
happening at all from 1. 

The probability of not getting 6 in a single throw is f 



Since S successive throws are to be made, therefore, the 
probability of not getting fi in 3 throws w ill he 




125 

216 


*» 0‘579 

Therefore the probability of at least one throw of six in 
three trials is 

1^* or I-*579or 0*421 

Moroney 1 has explained the law of muhiplkation by the 
following illustration : 

“Consider the case of a man who demands the simultaneous 
occurrence of many virtues of an unrelated nature in his young 
lady. Let us suppose that he insists on a Grecian nose, plati¬ 
num-blonde hair, eyes of odd colours, one blue, one brown and 
finally, a first Hass knowledge of statistics. What is the proha- 


1 Xfarmwy : frrfc/m fipm 



WtOBABtLITY 


$03 


bility that the first lady be meet# in the street will pot vAm as of 
marriage into his head ?" To solve the problem we most 
know the probabilities for the several different requisites. 
Suppose that the 

Probability of finding a lady with Grecian nose is ; 0*01 
Probability of finding a lady with platinum-blonde hair is ; 

0*01 

Probability of finding a lady with odd eyes is : 0 001, and 
Probability of finding a lady having first class knowledge of 

Statistics is : 000001 
In order to calculate the probability that all these desirable 
attributes will be found in one lady, we use the multi plication 
law. Multiplying together the individual probabilities, we find 
for our result that the probability of the firs* young lady he 
meets, or indeed any lady chosen at random, coming up to his 
requirements =»0 OOtXKXKKJOOOI, or precisely one in a billion. 

IHusU atiom : 

Q. i. What is the probability that a vowel selected at 
random in any English book is an *i’ ? 

Ana. If V is the total number of equally likely cases, and 

*&' the favourable, then the probability would be : 

n 

In the question a - -5 and b \ 

The required probability */*> 

Q. a. What is the probability of throwing at least one ace 
in a single throw with two dice ? 

Ana. The possible number of cases is (6 X 6 ) » 36. 

There are 5 ways in which die tan be thrown so as not to 
give an ace ; hence 25 throws of two dice will exclude aces, i.e. f 
the probability of not throwing one or more ace is 
The chance of throwing at least one ace ** 1 

Q,. 3* Whal is the chance of throwing a total of 7 in a 
throw of two dice ? 

Ana. The total number of possible cases are (6 x6)»36. 
Now let m make a list of favourable cases. We may label the 



304 aw iHTBOiwenort to statistical methods 


dice X And T to distinguish them. This total may be secured 
from any one of the following six combinations : 

0«e X Die T 

1 6 

2 5 

3 4 

4 3 

5 2 

6 1 

The chance of securing an ace with die X if V», of securing a 
f> with die is T 1 /*. The chance of the two in combination is 
*! m . Similarly the probability of each of the other f> combina¬ 
tions is */*. But Any one of these list results will give a total of 
seven and will lie deemed as successful. 


Hence 

p * / is 4" ^ in ^ in *i ’ * In "t * / n ^' / j§ v;!? * in * ie ' 1$ • 

We have in this solution answered the question "What is the 
chance of securing exactly seven in the throw of two dice ? M 
We might put the question “What is the chance of securing at 
least 7 in the throw of two dice ? M In this case a total of 7 or 
more will be considered a favourable event 

Probability of throw ing 7 with two dice 


t , 

II 

8 .. .. 


... t: 

; m 

,, 

,, 

9 .. .. 

*, 

♦ 

: ft! 

,, 


10 

1 I 

».. a 

M 

11 

If 

11 .. 

12 .. 

$ t 

* ♦ 

' lTi \ M 
. 1; 

M 


Sum of above probabilities ■ ' M 
Hence the chance of throwing at least seven (we may as well 
say more than six ) in the throw of 2 dice >s ai r u or 

4 * B Hindus. 7 Sikhs, b Pakistanis, 5 Burmese am] 4 
Crylnnesr apply for a post to which only one man will be 
recruited. Due to some reasons, the interview ing board decided 
to draw a name out of the hat. What is the probability of the 
job going to an Indian * 

Ana. 

Pmbabilitv of a Hindu w \- n 
,, „ a Sikh 

», of an Indian ™ s */ a# 4 ? j» k *or !*0'\\ 






FftOB&BltYTY 


Q,. 5 . One suck contains 5 white, 6 red and 7 black baits 
and another sack contains 4 white, 6 red and 7 black balls. A 
ball ts drawn from each tack. What is the chance that both 
balls will be white ? 

Am. Drawing a ball from one sack has nothing to do 
with what colour will be drawn from the other sack. So the 
two events (drawing) are independent. 

The probability of getting a white from sack l s */ t> 

M .. .* " » » M 2 W V|T 

„ ,, both balls white ■-*/,ax 4 /i? 

_ 

i*» 

Q,. 1 One sack contains 5 while, 6 red and 7 black halls. 
Two drawings of one ball each are made such that (a) the ball 
is replaced before the second trial, (b) th ball is not replaced 
before the second trial, Find the probability that both trials 
will give white half. 

Aaf. 

(a) The chance of getting 1 white in first trial ^ M 

The chance of getting 1 white in second trial®* 

The chance of getting 2 white in two trials '■ */,* x 


(b) The chance of getting 1 white in first trial ,, 

The chance of getting 1 w'bite in second trial«*Vir 
The chance of compound event«■ 4 /,„ x Vis** *%*» 

Q. 7* In a bag there arc 4 white and 3 black balls. What is 
the chance that if four of them are drawn one by one, (a) the 1 st 
will be white, the 2nd black, the 3rd white and the fourth black* 
(b) the balls so drawn are alternatively of diffr nt colours y 
Am* 

(a* The chance o< getting first whirr *</ 

The chance of getting second black 
The chance of getting third w hite 
The chance of getting fourth black «* 4 

The chance of compound events 4^3x3 J - * 

r 7x6x5x4 % 

(b) Begining with white we have seen the chance of the 
compound event *»«**/»• 

20 



306 kn iimomTCrKW towatvstical methods 


Begimng with black we will find that the chance of the 
compound event n again , / 3 *. 

The above two event* are mutually exclusive. Therefore, 
the required chance that 4 successively drawn bails are alter¬ 
natively of different colours */ H f ** h 

Q« i* From a pack of 52 card* two are drawn at random 
find the chance that one is a King ami the other a Queen ? 

Ana. 

The chance of getting first card King ami second Queen 

~+4) v 4/ 

. /l* *■ /ftl 

The chance r>f getting first card Quern and second King 

4 v * ' 

A* both of these events are favourable 

/.The required probability •••••-• ( 4 U t v 4f M ) 4 i V $2 >: \ 5 j ) "/#* 

Q. 9 * A bag contain* 6 white and 9 black ball*. Two 
drawings of t halls are made such that (a) the balls air replaced 
before the second trial, (h) the ball* are not replaced before 
the second trial Find the probability that fi»*t drawings will 
give 4 white and the second 1 black balls in each case. 

Ann. 

(a\ The chance o! getting 4 white halls in the first tru>, 

h .5 ... 4 .. 3 

Ifi 14 X 13 X 12 

The chance of getting 4 black balls in the second trial 

9 8 7 6 

V •• V. «■"■■ y . -•••. 

15 14 13 12 

/. T he chance of ’compound event 

6 >5x4x3 v , 9 > 8 v 7 x 6 6 

" iSxHxB'x 15 y ISxflxiJx 12” 5,915 

(b) The chance of Retting 4 white half* in the fim trial 
6 >: ft x 4 x H I 

“TS"xTf 4 «Tf 5 T 2 '-‘§r 

The chance of getting 4 black halls in the second trial 
9-87 ». t . 21 

II ;• 10 y 9“ 8 " 55 


The chance of compound event 


21 I 
55 91 




Chapter 14 

Binomial, Normal and Poisson Distributions 


Binomial Dirtritmtion 

I t ha* heren explained in the chapter on ‘probability* that 
■if a coin is tossed, the sum of the probabilities of the occu- 
mux of head (or success) and tail (failure) falling upwards 
»s p -v where p is the probability of success and y of failure, 
fhe woiii 'failure’ can also be expressed as zero success. 
So instead of ing the probabilities of failure and success, 
one ran expires them as probabilities of 0 and I success which 
are equal to q and p. 

Next, supposing two coins a and y are tossed together, the 
possibilities of the coins falling are : 

.v and ; both falling heads, 
x falling head and y tail upwards, 

} falling head and a tail upwards, 
x and y both falling tails 
7 he pwbahiiiutf of : 

Two heads (or 2 successes'; «■= p X p * p* 

One hearl and one tail (or only 1 success } 

(q > /-)■ qp + qp- 2qp 

Two tails (or 0 success} q:<q-q t 
Here the probabilities of 0, 1, and 2 successes are given 
by q % f 2np,p* respectively, he,, by the successive terms of the 
expansion of the binomial (q r pi*. 

Similarly if 3 coins are tossed simultaneously, probabilities 
of 0, I, 2 and 3 successes will respectively be given by the 
terms y* Sf*/*, $qp*> p\ ic., the successive terms of binomial 

iq^'p)* 

Front these results it is evident that the probability of 
acll) 0, 1,2 and 3 ^.,..,. successes are given : 




308 a* mnconrnriON to statistical methods 


for one coin or event by the various terms of binomial expan¬ 
sion of {q r />} 1 * i e., y4 />, 

for two coins or events by ty-f-/*)** i <f \ *Zqp \ p % < 
for three coins or events by (04/>j* i.e , q* \ 5q*p 4 -*•/&*, 

similarly, for « coins or events bv f qip} n or the successive 
terms of the binomial expansion (q+p)* 


li the probabilities of various successes or events 0, 1, 2, 3, 

_n, as obtained above are written against the number 

of successes, we get a distribution of probabilities as follows : 
No. of success •'*) 0 I 2 .« 

Probability (/} y n nq* l p *-■ ~p~ 'q* *p* . ^ 


The above probability distribution i* known as the 
binomial probability distribution or more commonly as 
binomial distribution 

The general term of this distribution is "C, q” ~ r p f 
PUptmafim : 

Consider n independent trials Let the probability of 
success in 1 trial be p and the probability of failure be q. 
Let ms calculate the probability of t successes and n r 
failures out of these « trials I .art first r trials be all suc¬ 
cesses ami next rt-t trials lie all failures The probability of 
joint occurrence of r successes is the product of the indivi¬ 
dual probabilities of the success, he., p y p y. .. r times 

Similarly, the probability of n r failures is q n ~ w . Thus 
the probability of getting exactly r successes and n r fail- 
urn in a specific order is p'q* “ r . But all ifiese r successes 
need not occur in the first r successive trials, they may be 
spread over all the n trials. The number of ways in which 
wf can get r successes and n r failures out of n trials is 

—* tif *c f 

r:ia —r) f 

All these *C, ways of occurrences are mutually exclusive. 




BINOMIAL, NORMAL AND POISSON DISTRIBUTIONS 309 

with probability each time. So the total probability 

of t successes and « — r failures m n trials it 

p r f-*Tp r f*~ p 4. *C, times, i.e. k n C,^ y 

This means that any term of the binomial expansion can be 
obtained by making use of the general term also. If in the 
general term *Cr$*~'p' the values of» are taken as 0, L 2 and 
3...,,.etc. we shall be able to obtain the first, .second, third, 
fourth,.... etc. terms of the binomial expansion, Thu* if 
it is desired to find the chance of 0 success in n trials r 
should be taken as 0, for chance I success r be taken as 1 
and so on. 

Ex^mpU : 

Find the chance of getting 3 successes in 5 trials when the 
chance oi getting a success in one trial is j. 

Here « ••• f>, p ■- j, r **3, $ * 1 —/»•* ■ 1.jj T 

Substituting these values in general term, required chance it 
ft C a ti; i "* (s ■*< i.e., O'33 approx. 

The coefficient for the successive values of exactly 1,2, 

3 ...successes can also be obtained from what is know n 

a* Pascal's Triangle \Blaise Pascal—1623-62) by writing on 
successive lines in the form of a triangle as given below 


Success or " 1 ' ' " - 

Powei of the ; Coefficient in the expansion of (f 4 p) n 
exponent ____ 


6 



1 


1 


i 

1 


2 

1 


2 1 


3 

l 

3 

3 I 


4 

1 4 


6 4 1 


5 

I 5 

in 

10 5 1 


6 

1 6 15 


20 15 6 1 


7 1 

7 21 

35^ 

35 21 7 

1 


It U clear from the above that m any line each entry is the 
sum of the two entries in the line above to its immediate left 
and right. Thus in the seventh line (success 6), we have 
6** 1-4-5, 15 54-10, and so on. 

It has been stated above that the probabilities of exactly 
0, l* 2, 3*..successes would be obtained by the expansion 







310 ATI lNT»l>miCTI0N TO STATISTICAL METHODS 


of {q^p} n , When such probabilities are written against the 
cor refund mg number of successes, the distribution *o formed 
ii a distribution of probabilities and is called the Binomial 
Probability Distribution, or commonly known as Binomial Di»* 
tribution. If, however, such n events are repeated ,V times, the 

frequencies of 0, I, 2, 3. successes will be obtained by the 

successive terms in the expansion of A [q4p)* 
or ,V \l+p) n 


-ff CC+"»*'•>+ 


«{»-!) {n —2} 
3x2>l 


yt. . JP) 


The frequencies obtained for the suc cessive vaiuesof exactly 

0, I, 2, 3.lucernes in A trial* for n events is called Binomial 

Distribution or the Bernoulli Distribution, after James Bernoulli 
(1654*1703) who stated the foregoing result in his Ars (kn}u* 
trndi published in 1713. These frequencies are also known as 
expected or theoretical frequencies. The frequencies which 
are actually obtained b> making experiments for n events in A 
trials are called observed or actual frequencies. Generally 
there will be difference between the expected and observed 
frequencies, but this difference will usually be little if A" is 
very large. 

It mas not be out of order to mention here that the variable 
in the Binomial Distribution is discrete and not continuous* 
Uhrtteattw : 

In how many cases should we expect to get 6 heads and 4 
tads if 10 coins are simultaneously tossed 1,000 times ? 

SbluUm ; 

Here j A 1,000, n - 10. 

So the expansion of A : {q~rp) % will give the numbers of suc¬ 
cesses G. t, 2,.10* 

Number for r successes is given by A *C,q*~ r p\ 

Here r h h, So the required number is l(J) 1 (|) # 

lr J«/9xH/7 1 13.125 mvr 

" 1 ’^"TTnrr x i .024'. 64 . 13 m ‘ pprwi - 


Ant *; Instead of 6 beads, if at least 6 head* would .have been 
asked v rhe question, then one should get the sum of 
team with 6, 7, £, 9 and 10 successes. 




BINOMIAL, NORMAL AND **Ot*SON DISTRIBUTIONS 311 

lUuHJation : 

The normal rate of infection of a certain disrate in animals 
is known to be 40%- In an experiment with 6 animals infected 
with a new vaccine »l was observed that none of the animals 
caught infection. Calculate the probability of the observed 
result, 

Selutim : 

Here we have to find the probability of 0 success. 

Lei * - 40% - y 

f” «>% - | 

We know that the probability of n events of 0 success is 
given by y n 

So here p of 0 success ( *■ < f *» ' — ** 0*0467 
' 5 ' 15,625 

Geasrsl Form of Bioomial Distrlbstisa 

The general form of binomial distribution depends on the 
following two points : 

1. The values 9 and />, and 

2. The value of the exponent n. 

If 9 and ^ are equal, i.c , 9 /b J, the frequency distribu¬ 
tion will be symmetrical and the frequencies on either side of the 
mean (or mode or median; will be equidistant. For instance, 
if four coins arc tossed 1,296 times, tbe frequencies for 0 , 1 , 2 , 3 
and 4 heads will be 81, 324, 466, 324 and 81 respectively. 

If. however, 9 is not equal to p, i t, ( 9 #/}, the distribution 
will be asymmetrical. Thus if 9 in the above illustration would 
have been $ and p # hc frequencies for 0, l, 2, 3 and 4 sue- 
cones would be 256, 512, 384, 128 and 16 respectively. 

The presence of skewness in the above distribution can be 
reduced by either of the following two steps : 

1. By so adjusting the value off that it may be nearer the 



312 IN WTftODUCTIOrt to STATISTICAL METHODS 


value of p . If this process is continued, a point will come 
when and the moment this position is obtained the 
distribution will become perfectly symmetrical, which has 
already been mentioned, 

2. By increasing the value of the exponent n, The increase 
in n when p^q (i e , p is not equal to q) y not only raises the 
mean and increases the dispersion, but also lessens the skewness. 
For the same values of q and a, the greater the n, the lesser will 
be the asymmetry. 

The following tables illustrate these points : 

TABLE A TABLE B 


'} f 
* 2 

0 

'Fermi of the binomial series 
100,000 (q + p)"> for values 

y » | y -* b 

^■*'1 j P'■*'* 

S ; 
« 

. v ! 

® 8 : 
2! 3 1 

m 

u. 

C ; 

■ 

Terms of the 
binomial series 
10,000 (q+p) m 
for values 

P*~ 1 

0 

1.216 

— 

0 

— 

1 

2,702 

5 

1 

3 

2 

2,852 

31 

2 

16 

% 

1,90! 

123 

3 

59 

4 

898 

350 

4 

159 

5 

319 

746 

5 

339 

6 

89 

1,244 

6 

596 

7 

20 

1,659 

7 

889 

8 

4 

1,797 

8 

M48 

9 

) 

1,597 

9 

1,304 

10 

— 

1,171 

10 

1,319 

It 

— 

710 

u 

1,199 

12 

— 

355 

12 

988 

13 

— 

146 

13 

743 

14 

— 

49 

14 

313 

15 

— 

13 

15 

327 

16 

— 

3 

16 

193 

17 

— 

.... 

17 

106 

18 

— 

— 

16 

54 

19 


— 

19 

26 

20 


— 

20 

12 




21 

5 




22 

2 




23 

\ 



BINOMIAL, NORMAL AND POISSON DISTRIBUTIONS 31$ 
Mtfttt aad Stinkard Deviation of Binomial Distribution 

Arithmetic Mean 

If p is the probability of success arid q of failure in one 
trial, n the number of independent events, then the probabi¬ 
lity of 0, 1,2, 3. n successes is given by the lit, 2nd, 3rd 

. rt-f 1th term in the binomial expansion of (yi- p\ m % Lr., the 


binomial distribution given as : 

No. of success (*) 0 1 2 . n 

Probability (p) q n nq*~ l p ™ q H ~*p* . p 


1 he Arith*mean of a distribution is given by 


Exp 


i > , - r e+««'• > (- . r 

■*' (q p)* l H ** 1 (as q - 4 - p •••••■ I) 

?xp 0.f+ f.»/>" 

- nq h * *p+ n{n 1 )q n l p i r.«>* 

. 

** {q 4- A/* ~ 1 «* #iA( l)* "■ 1 (as f 4 P "•■■■ l) 


* yy* „_Xw. 

* rp i 




So the mean of the binomial distribution is 
Standard Deviation 

taking v, and vj as moments about the origin 

aero. 

We know that mean, being about origin, 


V 


1 


rr i 


£xp-=* np 
£x*p Z**p 




(a* £/>“!) 










314 AN INIHODOCTION TO STATISTICAL MKT HO US 


£^-oy+iv>+ 2 ‘-" ( 2 V' ,) ry+ 


y.n (n \)(» '■Uf-y 


4 , .. * * 3n 'it l)(i» 2) 

iirv-*. 2x; -* 

0 *~y 4 .n‘/>* 

+ 2> 1-Y + 

."A" ') 

Breaking the second, third and following truui into two 
pari# we gel 

i**A“«Ait—H- (» ne-’piyy7 ‘V y*. r i \ 


4( (-1 l)T“-V4 


2-;» I) (a- 2; 


...{#• da*-') 

»A t(?4 A<" * i (■" -1 )A ff""* + (" 2)YV+-A’ ’*i 

»«A [I ■+■(» l)A(»+A)*“*J 

“^»A (1 -4-(« 1 )jp-1 ] 

»A (H (» !)>} 

"A l' 4 »A " A) 

* »/>+«*£’ nA* 

So iVA or v, s. »A »• »*A S «A* 

0 *<wt* 4 sa!v t *(v,)* 

r ~*p 4-w*A* ' "A" i"A)* 

« + »*a* «A* - «*/>’ 

«»A -«#* 

- »A (I.A) 

•“ »A (f) 

or «“ V nA» 

Thu Uw mean And itandard of the binomial distribution* 







BINOMIAL, NOIMiL AND fOIflSON DISTRIBUTIONS SIS 


»re up and respectively. Other constants of the bin©- 

mial distribution* are : 


s. 

(? p) 

“« 3/>*y*ir’ 4- »/>? {I — 6^y) 


ft 


< d- py 


7? 


”to 





3 - 16 # 


illustration : 


Seven coin* arc tossed and tUr number of head* noted. 
Tlhs experiment is repeated 128 times and the followtng is the 
observed frequency distribution of the 128 throw* according 
to the number of heads : 

No. of heads: 0 12 3 4 5 6 7 
Throws ; 7 6 19 35 30 23 7 I 

Total ; 128 

Fit a binomial distribution under the hypothesis that the 
coins arc unbiassed. Also find its mean and standard 
deviation. 

Sain (ion : 


The Frequencies of the successive value* 0* J t 2, 3,... 7 

will be obtained by the exj mansion of 

128 (H-i)’ 

or by JV (I'+Ot—rCV i. •••/>’) 

<«., 128{(J) , + 7(J) , + 2I(|) , 4-35(|) , + 35{ J5 , + 21(J)’+7(l) t 

-Kin 

- 08 {»/» + ’W’/m + **'», + ■*/,»+**/,»+*/„+'*/«*) 
«J . 7 4 21 +35 +35 +21 + 74 1 
Thu* the theoretical frequencies for 

0 1 2 3 4 5 6 7 head* 

! 7 21 35 S3 21 7 1 


will be 






310 AN INTRODUCTION TO STATISTICAL METHODS 

Mr art of the distribution it given by tip where n is the 
number of coins and p is the probability of success rtp«* 7 x J 
3 5. Standard deviation or n «* y/npq ** V 7 x i x t ■■■ y I *75 
- 1*323. So the mean and standard deviation of this distri¬ 
bution are 3*5 and 1*323 respectively. 

Normal Distribution 

It has been stated earlier that the binomial distribution is 
a discrete distribution as the expansion of „V (g *f/d" Rives the 

expected frequencies of 0, 1,2, .« successes, In other 

words, this as such is not a continuous distribution. However, 
this distribution may he presented by a histogram marking 
the horizontal scale interim of n units. If the exponent n 
increases, the number of intervals in the histogram will also 
increase because of greater number of successes. When n 
is very large, the number of intervals becomes very great, 
whereby the distance between the intervals narrows down and 
the histogram begins to suggest a smooth c urve If the expo* 
went * becomes infinitely great where /m* q the histogram gives 
a look of a perfect symmetrical curve. The limiting form of this 
histogram is the normal curve. Thus when limit less unbiassed 
coins are tossed simuhaneusly {he., when binomial distribution 
is of the form (0*54 0 5)** where a ts limitless, also called the 
limiting form of the discrete binomial distribution), where 
each coin is as likely to fall one way as to the other, and the 
frequencies of the successive successes are plotted on a graph 
paper, the curve that will be produced will be a perfectly 
smooth symmetrical curve known as normal curve, 

But in case p x q % and the exponent n is infinitely large, even 
then we could get an almost, though not an exactly perfect, 
symmetrical curve, which will also closely resemble the normal 
curve. 

The normal curve is a special type oi symmetrical frequency 
curve which is symmetrical not only with regard to its sides 
but also with regard to its peak. The peak of this curse is 
related in a definite mariner to its dispersion. It is a belt* 



binomial, nobmal Ann poisson distributions 31 ? 

shaped continuous curve and most of the frequency distribu¬ 
tions, specially of natural phenomena, vary approximately in 
accordance with fhti curve. Many of the statistical methods 


^COHCAVE 



Fig 14.1 

of describing and interpreting data are directly attributable to 
ihe properties of normal curve Without it, the technique of 
sampling could not have developed to its present stage. In 
fan u is a tool indispensable to modern statistics. The figure 
of this curve is shown above. 

Discovery of the Normal Curve 

This curve was originally developed by Abraham de 
Mnivrr (1667-1754), a French mathematician who spent 66 
year** of his life in England as a refugee mathematician, ami 
the equation of this curve was published m 1733, He di|- 
covered that the random variation in the number of heads 
appearing on throws of n coin# cor responds to the term# of the 
binomial expansion of (|4* J,* and at n become# larger this 
distribution approaches a definite form, while working on 
certain problem* in game* of chance. Because of this origin 
and because the data from certain coin throwing experiment* 
closely approached it in form, it is often called the Nmtmi 



318 AN INTBOOtfOTION TO STATISTICAL METHOD# 

Probability Cure*. To dc Moivre, it was a purely mathemati¬ 
cal exercise, completely unconnected with the application to 
empirical data. 

Jacques Bernoulli (1654*1705), a swiss mathematician, 
suggested for the application of the theory of probability in 
economic and moral affair#, but could not personally investi¬ 
gate the applications because of the paucity of numerical data. 

Actual statistical use of the normal curve begins with the 
work of the famous French Mathematical Astronomer, Laplace 
(1749- IB27 and German Astronomer, Gains ( 1777- 1855 \ each 
of whom derived the law independently and picsuinably with* 
out knowing of de Moivre’* treatment Gauss found that it 
represented very satisfactorily the rrroi* of observation in the 
physical sciences On account of this reason, this curve was 
called .Yormul Cun* of Error* where error is used in the sense 
of deviation from the true value. This curve is also called the 
Gaussian Curve because of the supposition that Gauss has been 
the first person to make use of its properties. Lambert A 
Quetelct (1796*1874), the great Belgian Statistician, was pro¬ 
bably the hm in popularizing the idea that statistical method 
was a fundamental discipline adaptable to any held of human 
interest in which mass data were to be found, and thought that 
the distribution of the sufficient and trustworthy data of the 
measurement of mental and moral traits would be found to be 
in accordance with the normal curve of error. 

Since that time, experience has shown that it serves quite 
well to describe many of the distributions, such as stature and 
other anthropometric measurements, age, etc., which arise in 
the fields of biology, edu*niton and sociology. Much ol the 
theory of statistics is built around it. 

Factor* which load to the emergence of a Normal Curve 

The physical conditions which lead to the emergence of a 
normal curve are : 

\ The forces which affect, individual events are indepen¬ 
dent! of one another ; 

s£ These forces are numerous and of equal weight* ; 



BINOMAL, NORMAL AND FOISgON DISTWBLTION8 319 

3. Such forces affect the individual event in such a way 
that the maximum frequencies are clustered around the mean 
value and give rise to a symmetrical curve and further that the 
positive and negative deviation from this mean value are, 
equally likely and large deviations are less likely to occur than 
the small ones 

Mathematical Equation 

If has been shown that the frequencies of 0 , I, 2,..» 

successes ate given by the expansion of A (q -hp) n * and that 
whenever/* the binomial series gives a symmetrical fre¬ 
quency distribution * But as the value of w gets very large, the 
problem of computing these frequencies become* difficult and 
tedious Naturally a method is sought to be evolved for the 
removal of this handicap and this ii attained by the application 
of the normal curve. This curve not only eliminates the cum* 
hrotts calculations but also gives the frequency distribution very 
closely approximating the binomial distribution*. This becomes 
possible as the basic constants of the normal probability curve 
have been tabulated and are readily available. It should be 
noted that normal curve is not a discrete series of terms but is 
a continuous algebraic function of the following mathematical 
equation : 

v ■??' 

r f . .r u* 

■ <l\‘2n 

fine : 

Inordinate of the curve at a point *, 

A Number of items in the frequency distribution, 

ft Standard deviation of the distribution, 

rr * The ratio of the circumference of a circle to its diameter 

approximately 3 1416; y'irr^ S 5066, 

t*> 2*718 (Base of the Napierian Logarithm}, 

y and # ate constant for a given curve, but change from one 

curve to the next, therefore, arc parameters of the curve. 

* As « approaches Wtnily and or / b approatmsirb n}tial u> f, It 
can be shown msYbetnattrafW that vxrfow* terms in the expansion 
Mf 4p) * take the form of the Normal Carve mentioned above* 




320 Afl INTRODUCTION TO STATISTICAL METHODS 


Th»» equation can alio be exprested as 

Y r IS*' 

T 4 tmsa f 

where T § is the height of the maximum ordinate or ordinate 

JV 

at the mean, a constant in the equation and -7^ 


*S~jS8Sa'' Ar>< * w ^ ere distribution has class intervals or t, 


T a ^ ^ 1 -?)» «*«*♦ any given value of the variable 

expressed as a deviation from the arithmetic mean. 

If AV | # he., in terms of unit area, 

♦ 1 W 


Staad«r<H*ed Normal Farm 

If the normal curve is changed to standardised scale, i.c., 
values are expressed in terms of standard deviation units, called 
normal deviate or standardized variable, represented by 

xx 

***** , then the *' or the normal deviate at the mean will 

e 


A V „ 

be zero (viz,, .*'«<•. 0 when X-* X) and at values 

G fl 

equal to one <r t 2a, and $n will Ik respectively 1,2 and 3 (to the 
left of the mean these will be negative as 1 variates will Ik lew 
than the value of the mean, and to the right of the mean, these 
units will Ik positive as x variates will be greater than the 
value of the mean). This is known as changing to the standar¬ 
dized scale. It should always Ik noted that mean• 0 in this 
scale, 


Here if we set 

,V 1, o**l, and y ~0, then the mathematical equation of 


the above stated curve* viz., F r ® 



will Ik 




BINOMIAL, NORMAL AND FOI»SON DISTRIBUTIONS 321 

1 r*'* 

changed to r e »—■—:—=g T* called standardised normal 

V 2fr 

form. A* v T»«»2*5066, it is 

1 "■***"* 

r -~T Jogg-^-OSSW. 5 

And at the maximum ordinate, viz., at the point of mean value 

X'-X 

where t «0 (as #1-variate becomes mean, so *' -1—0) 

(t 

r, >0 3989 

ay ■ ) 

Properties of Use Normal Curve 

A study of the normal cuive reveals the under-mentioned 
significant properties. 

1. It is a symmetrical curve, i.e., it is a single peaked or 
unimotial symmetrical curve. It has only one point of maximum 
ordinate and so if a curve is bi-mod* l, it cannot he called 
normal though it may be symmetrical. It means that the 
distribution of the frequencies on either side of the maximum 
ordinate of this curve is exactly the same. In other words, if 
we fold the paper in such a way that the folding line is on the 
ordinate, drawn from the mean, then the curve on one side 
of this will exactly fall on the curve on the other side of the 
ordinate at mean, 

2, There are few items on the extremes and most of the 
observations are clustered around the mean or the central value. 
The mean, the median, and the mode coincide because the 
distribution is symmetrical anti single peaked. The maximum 
ordinate frequency is at the centre of the distribution, he,, at 
the mean. In other words, we ran say that the height of'the 
ordinate at the mid-point of the base below the apex of the 
curve is the greatest, he., at the point x * 0, the ordinate has 
the maximum height. The height of the ordinate at a distance 
of one dr (sigma) from the mean is 60 653% of the height of the 
mean ordinate. Likewise, the height of the other ordinate at 



322 AW IWTOOBWCTfOW TO 0TAT1STICA1, METHODS 


various sigma distances from (ha mean is in a fixed relationship 
with the height at the mean ordinate. 

3. The curve approaches nearer and nearer the horizontal 

(X-X) 

axis as the normal deviate ~~-—- increases in the plus or 

minus direction, but theoretically, it never reaches the axis, as 
the curve is asymptotic to the base line. Though the curve 
extendi to infinity in both directions, yet hardly Q'3% °f the 
area lies be>orul the limits of mean f .S«r. 

4. The curve is specified completed by defining the mean 
(the origin of *). the standard deviation and the value 7 * 

5- Near about the mean value the curve is concave to¬ 
wards the x-asis while near about two tails it is convex to the 
horizontal axis. The points of inflection (the points where the 
change tri curvature occurs) are at a distance of one n from the 
mean (on either side of it). 

6 In normal curve, if nth moment is odd, the value of 
this odd moment will always be zero (as normal curve is 
symmetrica) and for symmetrical distribution the value of odd 
moment will always be zero for the sum of the positive devia¬ 
tions wtU always equal the sum of the negative deviations and 
thus will cancel each other). If nth moment is even, we have 

n' 

-r»— (where n is even) 

2 ’ 2 ) 

U * 

It follows that ^ «a/^y,fv. 0 r i.e., skewness is 

zero, p t ™ \ «"« 4 « 3, and 7 *-• 0 . if., normal curve has 

*4 

ttro kurtotis. In other words, norma! distribution is mexokurtic* 
7. The mean deviation of the normal curve is 

• ® W 88 « or approx 

5. The first and third quarttles are equidistant from mean 
and are a distance of 6745o. Thus the semi-inter quartile 
range is equal to 6743 of the standard deviation. 



binomial, normal and trasaoN distributions 323 

9, Of ill the important properties of the normal curve the 
most important property is the area relationship, The area 
lying between the curve, the horizontal r axis and any two 
ordinates is said to be the area under the curve and is equal 
to the probability for the interval marked on jr-axis by the two 
ordinates. The area lying between the mean ordinate and 
an Ordinate at a certain sigma distance from the mean will 
always be the same proportion of the total area of the curve. 
Thus the area enclosed between mean ordinate at a sigma 
distance from the mean is always 34*134% of I hr total area of 
the curve It means that the area lying between 4 one 0 
from the mean, i.e , between two ordinates at one sigma 
distance from the mean on either side will always be 68'266% 
or approx. 68 27% of the total area, Similarly the area lying 
between two ordinates at ,-£0*674 5* from the mean will 
always he 50% of the whole area. .Ordinates at ± l‘96e» 
4 : 2«, 4 : 2 5>758 ct and ±$* from the mean will respectively 
cover 95%. 9f> 45%, 99% and 99 73% of the total area, ft 
shows that practically all observations of a normal frequency 
distribution lie within a range of ±. 3*» or six standard 
deviations. 

Special tables (both of ordinates and areas) have been 
prepared by mathematicians which have greatly reduced the 
calculation work in the determination of area lying between 
two ordinates. These tables have been prepared in terms of 
normal deviates. A normal deviate is a variate in the standard 

deviation units, i,r,,~A * . U if of great significance as 

this variate is independent of the actual unit of measurement 
attached to the original data. On account of this very 
important property it it used a* the basis of the tables of areas 
and ordinates of the normal curve. 

Fin log a Normal Curve to ax> Observe* DistrilnUioa 

If the observed distribution appears to be reasonably 
symmetrical and bed-shaped, it may be worth while to attempt 
to 6t a normal curve for it. The normal curve idealizes the 



324 Alt mTAODUCTION TO STATISTICAL METHODS 

observational data and smoothes out the irregularities due to 
sampling fluctuations. In case the fit is good, the statisticians 



Fig. 14.2 

proceed to draw various inferences about the behaviour of 
samples from a normal parent population, and can confidently 
feel that their assumptions apply reasonably well to the actual 
population sampled. 

In fitting the curve, the mean and standard deviation of 
the observed distribution arc treated as the population mean 
and population standard deviation. 

In fitting a normal curve, either the ordinate heights are 
determined or the areas, i.e , proportions of the expected 
frequencies, are computed. 

(i) Ptwtthtti fm (hdinates 

Rod the X, o and X and class interval i, if any, of the 
observed distribution. Then calculate which will give the 
ordinatcjM the mean. 

For the heights of other X values, find out normal deviates, 
X' X 

vitv t ~——— for different values of X and then by consulting 

* w 




BINOMIAL, NORMAL AND POISSON OISTRIBimONS 325 

the table of ordinates find the ordinate expressed as decimal 
fraction of T 0 Multiply these individual tabular values with 
iy and thus ordinate heights of different values will be 
obtained, 

{«) Ptaudure fat Att&x 

Find the tabular values of different norma) deviate* by 
consulting the table of areas, and find the per rent of area in 
each class* Multiply it by .V and the figures so obtained will 
be the expected frequencies of each class. 

TABLE 14 1 

Area between the maximum ordinate sod ordinates drawn 
at varying distances from the maximum ordinate 


x-x. 

Area 

x F 

Area 


0 

0*0 

00000 

2*0 

47725 

0 1 

03983 

2 1 

■48214 

0-2 

*07926 

2 2 

*48610 

0*3 

11791 

2 3 

-48928 

04 

15542 

24 

•49180 

0 5 

19146 

2 5 

•49379 

06 

22575 

2-6 

*49534 

0-7 

*25804 

2*7 

•49633 

08 

28814 

2-8 

*49744 

09 

31594 

2 9 

■49813 

1*0 

34134 ] 

30 

-49865 

M 

36433 

3 1 

•49903 

12 

*38493 j 

3 2 

■49931 

13 

•40320 | 

3*3 

•49932 

1*333 

•40824 

34 

•49966 

14 

*41924 j 

35 

•49977 

15 

*43319 | 

3 6 

-49984 

16 

*44520 | 

3 7 

•49989 

17 

*45543 | 

3 8 

4999S 

18 

*46407 j 

3 9 

'49993 

19 

*47128 1 

4*0 

■49997 

1*96 

‘47500 j 




This convenience of determining the area between any two 
ordinates is of great help to us. If we know that so much 
portion of the area has between such and such limits we can 







326 All IfCTOODDCTlON TO STATISTICAL methods 


find out the probability of a certain event falling within those 
limit*. Let us illustrate the use of this property by illustrations* 

UlmUMwn : 

The Delhi Municipal Committee in&tals 5,000 electric lamfs 
in the streets of the dry. If these lamps have an average life 
of fiOO burning hours, with a standard deviation of 150 hours, 
what proportion of the lamps might be expected to fail in the 
first 600 hours ? 

S&luiim : 

In this case we want the area lying between the ordinate 

„ . 0 -.800 . 600300 „ .* 

at 0 and at 600 hours, if., ——— and • .. Consulting 

150 150 


the table we find that the area between these two ordinate# 
tf»*09I7fi (i.e,, *50000*^*40824). Hence the expected number 
of failures will be 09176 of 5,000 «459 
//lustration : 

In the above illustration what number ofUmpi may be 
expected to fail between 700 and 900 burning hours ? 

Suction : 


In this rase 
lying between 


we are 

700 noo 

150 


interested in calculating the area 


and 


900 '300 
150 


i.e,. 


666. Con 


suiting our table wr find that the area between these two 
ordinates is equal to 49074 i.e., *24537 r '24537;. Therefore 
the expected number of failures are *49074 > 5,000 2,454* 
Another important point which is of very great interest (in 
the significant tests specially;, b that from the area property of 
the curve, we can find out the probability of a certain event 
deviating by a stated amount in either direction from the mean 
of a normal distribution. The probability of a certain event 
deviating from the mean by ± 1 o is 1 out of three, by ±2«x, l 
out of 20 and by £3®, I out of 370 
Illustrator it : 


How would you use normal distribution to find approx, the 



BINOMIAL, NOKMAJL AND PO!9*0N DISTRIBUTIONS 327 


frequency of exactly 5 successes in 100 trials, the probability of 
success in each trial being pm0 m l ? 

[M. Com., Delhi, /yjy) 

Solution ; 

Let n be the number of trials, 

So x*»IOO, and y«*0*9 {»,«., 1 . p) 

The given problem is a case of binomial distribution and so 
mean =» np or 1 (X) x 0 * 1 10, <r \ f npq or \/ FOO^Tb’ 1 x 0*9**3. 

We know that in the limiting form of (p~tq)* when a is 
large, the binomial distribution approaches to normal distribu¬ 
tion which is a continuous distribution So the frequency of 
exactly 5 successes in the normal distribution will correspond to 
the frequency of class interval 4 5 to 3 5, and the mean and «r of 
the binomial distribution will correspond to the mean and a of 
the normal distribution. 

Where the lower limit of X r 4 5 


Where the upper limit of X 5*5 
So the normal deviate of X {4 5} or * 


x : _x 

<3 


4 5-10 
™3 


— I *83 


and the normal deviate of ,V«{5 5 } or x *>* 


5'5-JO 

“. 1 *" 


-1-5 


Now area of the value ~ 1 83 4664 

and area of the value - 15 «**4332 

So the area between - 183 and -1*5 **“4664 ~ 4332«0332. 
If ;V be the total number of frequencies, the frequency of the 
class interval 4*5 to 5 5- S a 0 0332. Approximate frequency 
of exactly 5 successes in 100 trials when the probability of 
success in each trial is 0 I w,.¥x *0332. 

JUrnif alien ; 

If two normal universes A and B have the same total fre¬ 
quencies, but the a of the universe A is x times that of the 
universe B, show that the maximum frequency of universe A is 
1/a that of universe B 

(LA.S. t i 947 ) 

Let the distributions A and B have the means M t and M $ 



328 kn inruomcnQ* to nnwimtckh unman* 


rcipectiveJy, * * and * respectively as their standard deviations 
and total frequencies in both cases be A - 

normal distribution of A will be given by 

.... ;* • Afji* 

}\ ... f 

YtrVX 

and normal distribution of ft it given by 

y • -> - M % * 

r«~-7«—* ' *"*" 

v2*® 

Maximum frequency always corresponds to the mean ordinate 
multiplied by the length of the interval, say, t (for both the 
universes) ; so of the universe 

.v . ,Yx» JV, . i 

v 2«iw v 2rr«ur v 2r* * 

and similarly of the universe ft, 


.V . .V, 


T e . 


V*2lHS 


Thus we find the maximum frequency of universe d * 


x - maximum frequency for ft * ^ or the maximum fre¬ 
quency of universe A is \jx that of universe ft. 

Illustration ; 

Draw on the tame graph the normal curve 
l 

r . . .&i* 

V 

w hen «i} -v 0, « « 1 1 {«} ji • 0* <s • 2 
(iii) -.2, *-• I, (tv)j*i«*2. *■■■*--2 
Vuu are tua expected to plot the points from the table. 

(A/Xm, OeM» t t$$4 

P'iUwg mrmal euntj'&r : 

1 

^ r..- * 2 

V 2 s 

Taking logs, 

n* 

log r- - t y.og 24- log*) - ^ log e 




BINOMIAL, NORMAL AND TOI8SON DISTRIBUTIONS 329 

*. -*(0 5010 +0-4972) - y(0 4343) 


--*(0-7982)-^ (0 4343) 
=• --0-3991 ~x*(0 2171) 


Putting for 

X 

0 

±i 

±2 

± X get 

l0R 

r 

] 600 9 

1-3837 

2 7323 

3*6470 

Thus when 

r 

0*3989 

0 2419 

0*0539 

0*0044 


X 

0 

±1 

■fc2 

±3 


i 

0 3989 

0 2419 

00539 

00044 


arc plotted on a graph paper, we Will get the normal curve 
corresponding to (i of the problem. 


At the maxiutn ordinate * 0, (as we get 

r«* —being just half of- -J— the niaximu inordinate of 
V2*r, 2 V2?r 


case (i). 

Similarly, the ordinates at ± lo, ±2<t and i 3a will also be 
respectively half of case (i). 

As o^2 

x o ±2 i 4 *6 

y (H 995 01210 00260 0*0022 

Fitting these figures of a and j> on the graph, the normal 
curve corresponding to (ii; will be obtained* 


fiii) 


.1 

V2fr 




The only difference between case (i) and this case 

(( ■“** a* 

(Hi) rs that whereas in the former the power of r is ^ ^ 



330 AN INTRODUCTION TO STATISTICAL METHODS 


It it ~ '*2 ™Thi* me»n* that value* of Y at 

* 3 -2 -1 0 t 2 3 

of case { *) are similar to the values of 

.v l 0 12 3 4 5 

of case (iii > 

So plotting for 

* l 0 1 2 3 4 5 

j *0044 0339 2419 '3989 '2419 0339 -0044 

we get the normal curve for case {iii}. 

(iv) 

l 71* Z ^ 
r~ /( r w . 

V / 2tt.2 

On a comparison of this case with case [n) we hud the 
value of 7 corresponding 

* 6 ~ 4 * - 2 0 2 4 6 of case (iii are similar to 

m .4 . 2 0 2 4 0 8 of case (iv) 

So r-*0022 0269 M210 1995 1210 0269 '0022 

So plotting the values of * and r, wc would get the required 
graph. 

Illustration ; 

It is known that the weight of a group of 10,000 persons 
is distributed normally. Draw up a table showing the number 
of persons you expect within the range mean £3 » in various 
groups of class intervals of 0 5<s 
Solution : (See table on page 331). 

We notice here that the total of fitted frequency is 9,973 

anti not 10,000. The reason is obvious. We have not 

included the case# lying below 3<* and above 3ir the pro* 
liability of each of which is 00135. In other words, both 
considered together come to v 0O27 and if this is multiplied by 
1 0 ,fXlO, then we get the figure 27, When this figure of 27 is 
added to 9,973 we get a total figure of exactly 10,000, 



BINOMIAL, NORMAL AND POISSON DISTRIBUTIONS 331 
Let mean**A* 


i 

* i_ 

£ £ 

U w* 

|st 

, a 

& u 

■lit 

Class intervals 1 

-o<£ 

v "5 o 
■5 S'Z 

; *5 

*,© 

T x 


St. 

S ' 

?«2? 

a SM 


g > § 

24- 

2 T v 
<£-£ 

Set 

: 45.1 

8* 8 ? 
* §.8. 

Below X 3r$ 

- 3 

*00135 

— 

— 

X— 3c to X~2 5« 

• 2'5 

0062 

■00485 

48*5 

.V- 2*5* to .V 2 '09 

.2 

0228 

0166 

166 

X 2 0e to .V- !*5<j 

. 15 

066ft 

*0440 

440 

X 1 5c to X l*0o 

1 

158? 

*0919 

919 

A* 1 Oe to A' 0 5cr 

•" '5 

3085 

1498 

1,498 

X ■ 0-5 a to A r 

0 

5000 

1915 

1,915 

A’ to A' l 0 3c 

•5 

3085 

*1915 

1,915 

,V > 0 5a to A-fl-Off 

l 

*158/ 

1498 

1,498 

A f 1 Off to A' f 1 5*5 

15 

0668 

*0919 

919 

AVI'5e to Jf-f-2'Oe 

2 

0228 

0440 

440 

X v2 0a to X+2’5o 

2 5 

<M2 

■0166 

166 

A+ 2*5* to A4-3 0c, 

3 

■00135 

00485 

485 


9,973 


It raay be noted that instead of finding the area to the 
left for (—} and to the right for f-f ), the total area for each 
of the normal standard deviates may be found, and with it 
also area of claw interval can be similarly calculated 

Tettiag the norma lily of a certain DUtributios) 

There are two methods of testing the normality of a cer¬ 
tain curve, The first method involves the calculation of the 
values of fa, y % and y v fa denotes whether the curve is 
symmetrical or skewed. It is ztto when the curve is syttv 
metrical fa thaws the pcakrdness of the curve. The peak 
is normal or the curve is mcsokurtic whto £,«3. Ufa is 







32 An tmmmcnon to statistical methods 

jm (halt three, the curve is described as platykurtic mud if 
it is more than three, then it is called Icptokurtic y, also is a 
measure of whether or not the distribution is symmetrica). 
7, measures the departure of peakedness from normality. The 
two standard errors of 7* and J t denote that if the values of 
y t and 7* *re lets than twice their standard errors, then the 
distribution is not significantly different from the normal form. 
If they are greater than twice their standard error# the distri* 
button is not normal. 


V 


* fti* 




\-~y y I ; 




>V w d* • 3; s.e. of 7, 




.6 f f T4 

"“y"; s e. of y»«V \ 


The second method is the fitting of the normal curve and 
telling the goodness of fit by the ** method. The first method 
is illustrated in the following example. 

listing tk$ Neffimtity bjt tk$ 1st A ftthsui 
Ittustraiion ■ 

The following is the frequency distribution of £000 students 
of a college : 

Tfeiglits Tn PrequencT" 

inches 


595 

cos 

2 

605 


9 

61*5 


20 

62 5 


75 

635 


125 

64*5 


200 

65*5 


214 

66 5 


160 

67*5 


HO 

605 


50 

69*5 


20 

705 


5 

71*5- 

-72*5 

2 


£000 


Test the normality of the distribution. 




BINOMIAL, NORMAL AND rOlSSO!* DI8TIUB0TIONS 33$ 

Solution ; 


« 

J 






zz 

3 

s . 

a 

S* 

(t 

/ 

a 1 

a * 

/*'* 

A ' 4 

■6 

to 

Q<\ 






60 

6 

2 

12 

72 

432 

2,592 

61 

- 5 

9 

- 45 

225 

- 1,125 

5,625 

62 

4 

28 

-112 

*48 

1,792 

7,168 

63 

3 

75 

-225 

675 

2,023 

6,073 

64 

-2 

125 

250 

500 

- 1,000 

2,000 

65 

-1 

200 

—200 

200 

- 2 00 

200 

66 

0 

214 

0 

0 

0 

0 

67 

1 

160 

160 

160 

160 

160 

68 

2 

110 

220 

440 

680 

1,760 

69 

3 

50 

150 

450 

1,350 

4,050 

70 

4 

20 

80 

320 

1,280 

5,120 

71 

3 

5 

25 

125 

625 

3,125 

72 

6 

2 

12 

72 

432 

2,592 



1,000 

- 197 

3,687 

— 1,847 

40,467 



197 

A' 

i,66o 

, Wi» „ 

3,687 

.V 

i,5bo 


~ 1,847 

r. 

1,000” 


*197 

3687 

I 847 






334 AN INTRODUCTION TO STATISTICAL MKTBODf* 


(*,*»», -(v,)*--—, (corrected) 

» 3687- 039083 
-3565 

n*-** 3*,v, + 2*,* 

- - 1847 3x3-687x(-•!97)4-2x(•-197)’ 
*» 317 


•*« ’♦ - <«**» 4- «»»*•* - V - |t*, + 

*40467- 1-455-f 860 005 1 824 * 029 


-38 072 



B m **** 

* TT 

(•317)* 

“TBST)*” 

002 


38 072 
" (3-565)** 

2 996 

y.-VPi "“V - 002 -* 045 
y,«8,~ 3-2 996-3- 004 

».e. y,*>\/ 


[JL. 

f 1,000 

r.’-V 

/ 24 

'IT- t 

fl L 

< 1,000 


- 077 


--•15 


It is clear that the curve is symmetrical at 7, is less than 
twice its standard error. As y t is also considerably less 
than twice its standard error the curve does not depart from 
normality. 


Advmatagea of the Normal Carve 

In fact, without the discovery and development of the 
itortua) curve, the theory of statistics would not have been 
as advanced as we find it these days. Nearly all the natural 
phenomena, with a few exceptions here and there* vary in 



BINOMIAL, WOE UAL AND *01880* DISTRIBUTIONS 335 

Accordance with this fundamental distribution. The heights 
of people, the length# of tree#, length# of bone# of mammal# etc. 
(all them distributions} can be described with the help of this 
curve. Once we arc able to idemify a certain distribution with 
the normal curve, we can draw upon a store of knowledge 
for the purpose of analysis and interpretation of that distri¬ 
bution. 

Another more important advantage which has in fact con¬ 
tributed to the development of the modern theory of sampling 
is that the distributions of the standard deviations, mean# and 
other statistics of samples of a fairly large size, drawn from a 
population behave in accordance with this law. It is because 
these sample statistics vary in accordance with this specific law 
that certain inferences can be made about the population from 
which the samples were drawn. Let the mean and the standard 
deviation of a sample of 100 items drawn from a population, 
say, the heights of the people in India, be 65* and 3* respec¬ 
tively There if 95% probability that the mean of the popu¬ 
lation will be between 64*4* and 65*6*. 

Poisson Distribution 

We have discussed earlier, under binomial distribution, 
cases where probability in one trial is known as p and the trial 
is repeated n times- If the probability of occurrence oftheevent 
(or success) p become# small and the possible n umber of successes 
n be sufficiently large and the exact value it not definitely 
known, then it will be difficult to apply binomial distribution. 
Even if n is known and is a very large number, calculations 
involved will be long. Such a case will arise in connection 
with rare events, e.g., number of persons killed per day in 
road accidents, number of mistakes per page in the final proof 
of a book etc. The occurrence of such events is not haphazard. 
Quetelet and Van Borlkiewtea pointed out that their behaviour 
can also be explained by mathematical law. Poisson, in 1837, 
obtained a limiting form of the binomial distribution. Under 
the conditions that p becomes small, * becomes large, such that 



$36 AN INTEODDCTION TO «TATISTICAL METHODS 


the produced ftp h n finite value. This mathematical limit war 
later found useful by Van Bortkiewies in explaining the 
behaviour of tuch rare events and was given the name Poisson 
Distribution. 

The Poisson distribution or the probabilities of occurrence 

of various rare events (successes) 0, I, 2* 3.are given as 

follows ; 


No. of Successes (x) 0 1 2 3 . r .» 

m*e~ m m r * r 1 * 

Probabilities {p; *'" m mr m —r:— ...- j- ... 


where *«**2’71ft (bate of the Napierian Logarithm) and m is 
the constant for a given distribution, but changes from one 
distribution to the next. It is termed as the parameter of the 
distribution and is equal to the arithmetic mean or the square 
of standard deviation of the distribution. 

The binomial distribution is completely known if two things 
p (the probability in one trial) and n (the total number of 
trials) are known. Similarly normal distribution is known when 
two things, i.e. t r (the mean; and n (the standard deviation) 
are known. But Poisson distribution is completely known 
when only one value m (the mean of the distribution) is known. 
As in binomial distribution, the variate of the Poisson distribu* 
lion is a discrete one. It is the number of success ( x } and 
has only integral values. But the highest value for x may be 
sufficiently large and its exact value may not be known in some 
cases. In the theoretical distribution the value of x approaches 
infinity. 


Utility of Poisson Distribution 

It is dear from the above that Poisson distribution can 
explain the behaviour of the discrete variate arising in a case 
where probability of occurrence of the event is smalt and the 
total number of $»ossib)e events is sufficiently large, ft lias been 
applied in connection with death by rare diseases or events 
tike road accidents or horse kicks. It has also been found 





BINOMIAL, NORMAL AND POISSON DISTRIBUTIONS $3? 

useful in connection with several biological problems, problem! 
connected with telephone traffic quality control and sampling 
etc. 


Prospect of the Poisson Distribution 

The Poisson distribution is a distribution formed by discrete 
variate with a long tail towards right hand side. The various 
constants of this distribution (with parameter iw) are as 
follows : 

(1) Mean of the distribution is m 

(2) Standard deviation is \/m 

( 3 ) Skewness, given bv/*,, is * . i f., as the value of m 

m 

becomes larger and larger the distribution become! less and 
less skewed. 


(4) Kurtosis given by v k is also * . Thus as the value of 


m becomes larger and larger v, approaches zero and the Poisson 
distribution approaches normal distribution. 

5) The first four moments about mean are : 

iq*»o, Va m, w s -■■■m, u t ^ 3ai*4-m, 

■6} coefficients of the distribution are: 

— and Bm «.3 -» 
w ' * m 


0 « 
/O 


Fitting a Poisson Distribution 

The two examples given below will explain the method of 
fitting the poiwoii distribution (i) when value m in given and 
(r>) when m is to be determined from the given data. 

(i) Mean of a distribution is 2 0. What are the probahi- 

Hties corresponding to various successes 0, 1,2, .assuming 

that the Poisson distribution explains the data. If the total 
frequency is 1,000, what are the various frequencies ? 

22 




338 An mmoDucnoN to statistical methods 



Now P((i) »r(in the present case) 

/. Log P(0),« 2 log * 2( 4343) •* - ■ ■ ‘8686 « 1 1314 
,\ /><<)) -0 1353 


From 

the 

relation ( 

1) above, putting t ■■■■■•■ 

1, 2, 3 etc. 


m 

* — 

1 

#>«>}■ 

Kto. 2 

/•{(>)- 

2x0 1553 

- 0 2706 

P( 2) 

w 

* 2 

p(i; 

2 

r ~ 2 ' 

fd)- 

pi n 

02706 


PCI) 

2 

'• 3 

r(2;« 

•? (0-2706) 

‘01804 

PH) 

IW 

pm 

2 

~ 4 

PiV,«, 

~( *804) 

d)0fi02 

P( 5)—"- 

Pi 4) 

2 

5 

P( 4)» 

( (>902) 

- 0 0361 

P(S) 3> 

■Hi ■ - ■ 

6 


2 

” 6“ 

pr.5) - 

^ (0 036!) 

*•0-0120 

e(?)- 

m 

7 

P{6) 

2 

"" 7 


-y (0 0! 20) 

» 0-0034 

P(8) - 

m 

P(7) 

2 

* ¥ 

/*(7)« 

■jP(7] 

*« 0*0009 


This process may be carried to infinite number of tern*. 
But practically speaking, after 10 or 15 terms, the probability 
of successes will be so small that hardly there will be any 
interest in rakblating terms beyond that. 







binomial, kobmal and mumm mmmmtrnom 339 


In the present ease Poisson Probability distribution i« at 
fellm: 

Mo. of Successes {*) 0 1 2 5 4 

5 6 7 3 nr mm 

Probabilities '» 0 1353 0*2706 0*2706 0*1004 0*0902 

0 0361 0*0120 0 0034 04)014 

At the theoretical distribution has infinite number of terns 
the last term is written as *8 or more*. Its probability has bate 
found by subtracting the total of probabilities cor r c s p ott ding 
to 0 to 7 successes, i.e , 0*9986 from the total probability. It, I. 

Frequencies corresponding to various successes can be found 
by multiplying the respective probabilities by the total 
frequency* In the present case thus we have : 

No. of Successes (*) 0 1 2 3 4 3 

6 7 8 and more 

Frequency (/) 135 3 270 6 2706 180*4 90*2 3*i 

12*0 3 4 1*4 

(ii) I he frequency distribution of the number of men 
killed by the kicks of a horse in a certain Prussian army carps 
in twenty years is given as below (von Bortkiewiet data) : 

Deaths (*) 0 12 3 4 

Frequency {/) 109 65 22 3 1 

Fit a Poisson distribution to the data. 

Solution : Here mean of the distribution is not given and 
has to be calculated from the given data. 

From the data we have .V** iy«*2O0 and £xj«* 122 

122 

Mean»» 200 - 0*61 »m 

,V The frequency corresponding to any Successes, say r, is 
given by Fir) . ,Yx P(r) ~JVir* m'h\ or by 200 x r 4 * 1 m\)'M 
in the present case. Here also the procedure followed in the 
last case can be adopted. But that will involve tong multipit* 
cations Here we adopt a different method : 

log F(r)»*}e*g Mr m ~*leg N—m fog * -f 1 fog at—fog rl 



340 An introduction to statistkul methods 


«log 200—0*61 x04343 4-r 'og 0*61-log r! 

(in the present case) 
*2*8010 — 0 2649 ±?(\ -78533 - log r! 

2-0361 4- r(T’7853) — log r! 

Hitting f -0, I, 2, 3, 4 ric. t we have 

No of Suer eases 0 1 2 3 4 and more 

log F{r 2*0361 1 8214 ! *3057 0 6B9 — 

Fir) 108 0 66-3 20 2 41 0 7 

jV&u : (!) log F{r'} cannot be determined lor ‘4 and march 
Frequency corresponding to this is determined by subtracting 
from total frequency, if,, 200, the total of frequencies upto 
3. he,, 199 3 

(2) Here we observe that the given frequencies and 
the frequencies obtained hv Poisson law sue quite close to 
one another. This confirms the tact stared earlier that Poisson 
distribution is the law explaining the be has tour of rare events 
such as men dying by horse kick* etc 

Problems Involving the use of Poisson Distribution 

(i'f The mean of the Poisson distribution i\ 2Ti Find 
the other constants, 

SMninm ; The mean of the Poisson disti i hut ion is tn 
i.c . «»’■■* ?T». The other constants are : 

Mi Standard deviation ■■ \’ m ■•■- \ '2 5 • • 1 5f* 

2 > tq 0 

: 4: jU„ p 4 'A 2 5 

(*b J: « ■ 

j.5' i. l »'.3si 9 - ; n -21 ‘25 


Mi/ In a ret turn Poisson frequency distribution, the frr* 
qurnry corresponding- to 2 surer.stes is half the frequency 
curjj:d*n.) ; y i ' 3 sucre****’* 1 uh! i?s meats arid standard 

deviation 



bihohual, formal Arm mtmon mmmBvrmm 341 


Salutom ; Let the parameter of the distribution be m and the 
total frequency -V. 


Frequency corresponding to 2 successes «%V 


r m m t 


21 


stid 






3? 


wr have ;V 


2 ! 


'I X 


3! 




m 

6 


or m»f or A- Mean *.--6 

Standard deviation v w ■- V b *•2 45 approx. 
rfVtftf : When m is known other remaining constants and 
also the probability corresponding to various number of 
successes can also be determined. 


(in' A hrrn produces articles of which 0‘i per cent are 
usually defective. It packs them in casts each containing 
500 art ic les If a wholesaler purchases KH) inch cases, how 
many cases are expected to i>e free of defective items and how 
many are expected to have one defective each ? 

Solution : Herr p -- v 0’ 1 % he « 0*00! and n**500. With 
these constants Binomial distribution can be applied. But this 
will involve long calculations. In this case, as p is small and n is 
sufficient!} large, we can also apply Poisson distribution. The 
parameter of the Poisson distribution mnp 001 x 500 - 0 5. 
•\ The probability of 0, 1, 2, 3,... defectives is given by 

r*'* m r -A where f^0, I f 2, . 

The probability that a case contains no defective 
0*6065 

and the probability that a case contains one defective 


0 5#-** 1 

«£ - -- ■ 


030325. 


In a lot of 100 cam the number of cam expected to 





M2 ah ifitmonucrsoH to statistical methods 


have no defective article^* 100x0*6065^61 approx. The 
number of oxet Having one defective*® 100 x 0*30325 

«* 30 approx. 

(iv) It t« I in 1,000 that a birth it a cane of twins. IF there 
are 100 births in a town in one day, what it the chance that 
two or more pain of twins are born ? Compare the results 
obtained by using (a) Binomial distribution, (A) Poisson 
distribution 

Mutum : (a) Here probability of a twin birth (p)** pgpjj 

**'001 

and the maximum number of possible births (a)«»!00 

.\ fmi /m .|«--001«*999 

A The probabilities of 0, 1, 2,_twins are given by the 

1st, 2nd, 3rd.. ..terms in the expansion of {’999 + *001 )*•* 

A Probability (no twin) «* (999)**»« *9046. 

(simplifying with the help of logarithms) 
and Probability (one twin)** 1G0(*999) W ( 001)« 0906 
Now Probability of two or more pain of twins births 
** I ““ ((Probability (no twin) 4 Probability {one twin)} 

»1 - (0 9046 4 0 0906) **1- 0 9952 ® 00048 
A The required probability using binomial distribution 

' 00048 

(A) Parameter of Poisson distribution-*/> * 100 x 001 ** 0 1 

A The pi ^liability of 0, I, 2, 3-twin births are given 

by various terms obtained by r «*0, 1, 2, 3_etc. in t~ m m r jt\ 

r*' x (0 1)>( (in thg present case) 

A Probability (no twins) (2’718)~* 1 ^0 9047 

and Probability (one twin) --(01) 09047 

•*0 0905 approx. 

A The required probability of two or more twin births 
-1 - (# 9047 4* '0905) ~ 1 - (0*9952) -0 0048. 
Comparing the two resulu we hud that the two methods 
gift identical results. It may he noted here that if the results 
ate calculated correct upto 5 decimal places the difference will 
be owl f in 5th place of decimal. 







BINOMIAL, NORMAL AND POISSON DISTRIBUTIONS 343 

Cl kihrtw rf A. Mm and Standard Otdatka of 

Poisson DUtribntioa 

Before giving the proof* a result higher mathematics 
which i* used in this connection may be given. 

sc 14- m f- ^ JT + . infinite numbel- of terms. 

Mean : 

The Poisson distribution is given as : 

No* of Success (a) 0 1 2 3 4 

Probability (f) $~ m mt~ m *£,- m *, —^—. 

The A. mean ii given by Zxfijl'p 

_ mU~ m mV™ , 

Now + ]—*—fT—I- . 


■ / . , , m* , m* , 

** ( l- u 'nf-j4-— r . 

^lT m . #"* w^as 1 . 


£xp**0. r m +L m—+ 2. 




2 ! 


mV" 

3f 


■44. 


4! 


- -- 0 -t-mr* " + mV - * -4- 4 ---- + . 

H*»+” 4 >~ +.) 

*^mt~**.t* i*»m. 

The mean of the Poisson distribution - m. 

Standard deviation : 

Let the moment* be taken from origin, i.e-« 0. Then F s ~- IV 
where K, «= » A, Mean and f*«* ~£p~‘ 

Zx*p~Q . . mr"f2*. iSjp+S-.-jp 


rn*g~ m 

+*-TT+- 









344 aw iWTRom/ertow to statistical methods 


«*'■■■ 0 f «#“ "* 4 2. m*r~ m 4 3. 


2 ! 


-44*- 


3! 


** mr m £ 14 2m4 3 — 4 4 


1 

J 


Breaking each term within the bracket* into two part* cacti 
we get : 

*•*—-{( >■>■•*% . ) 


( ”+ 2 J! 


m* . m* 


+ 3. 


3! 


. )} 


r— I {«*;■ * (m+«»*+“*+.)1 

»— [ #“" + « (l4m+~ +.)j 


- ■ **• U m 4- «#*) (l 4-m) 

m# # ( 14 mj a»w( l 4‘ m) m 4«* 

/. m* «* 2T xPp'Ep** (m 4 m*} / i «*m 4 iw* 

4 j ■■■' * F„ - T, 1 -r ( iw 4* m*) ■— (m)* (as ** A. Mean m ) 

{J ; T 2 N ffl 

/, Standard deviation - ^/m 

Proof for Poiitoo Distribution is a limiting cut of 
Binomial Distribution 


Before giving the proof it will not tie out of place to point 
out two result* ofhighcr mat hem a tics which arc used in this 
connection 

(l) The exact value of #, the base of Napierian Logarithm, 


/ at \ a» 

ii given as U I I 4 ™ } - It can be shown by expan* 

a-so©' * 

siotft that this is approximately equal to 2 a 7l8. 

(1) Wc know «!«•*(*-1) (n—2)..1. Stirling has 

shown that the approximate value of a! in terms of # and 
powers on * is v'Sirx 









BINOMIAL, NORMAL AND POISSON DISTK IB ITT! ON S 34$ 

Proof H P ** th* probability of success in owe trial,, ft 
the number of trials and </ ** I -p t the various terms of the 
binomial distribution are given hv putting r»*0, 1*2,3 .« 

id 

in the general term P* or ———n {1 - A'* 

r!(n—rj! r 

If we can show that under certain conditions this term 
approaches the general term of the Poisson distribution* be,, 
t~ m m' r[, we can say Binomial distribution tends to Poisson 
distribution, under those conditions. The conditions as stated 
earlier are that n approaches a very large number, he., oo and 
/'~>0 such that np^nm, a finite value. 


n y n f 


m . 

p*»-- ■ in 

n 

P (r), we get 

H-r 

f 

m \ j 

' m \ 

V J \ 

s n ) 


I 


*yir i- m i x m ' 

"CjL » j *'• 


in j)\n L' 

Applying Stirling’s approximation and taking limits 
y'27T« n*t~ 


l.t P(t)mlJ .. ... v 

-go ^ ft r j (»•— r;" t 




Now 


— n 



m) 




*) 




I 


r 

n 




] 


O ' 


"[ ~r)j ! > •: ! 

** \b 

( V ) ■ * n l* ro ^* n K caie as *-*«»» 


As m^U 






346 AN INTRODUCTION TO STATISTICAL METHODS 


,i„[ l+ (-v)] -*•" 


u \ h( " ;i 

L V " >\ 

U Lt -.. 

n * m v'SNr (».j r , <v 


-* f 4 


i " j 

n / 


-J .-”W 

\ « / v n f 

Now the termi in the tint bracket simplify to I and 
the terms in the second bracket will approach I as n->oo. 
So we get 


(J /\r) 

#<->oo 



Hence each and every term of the Binomial distribution 
tends to the corresponding term of the Poisson distribution, 
provided a-w*, p~¥) such that a finite value. 




BINOMIAL, N0BNAL AND POISSON DlSTBUnmOBa 34? 
EXERCISES 


1. Eipttio with the help of tollable illustrations the mcaniog of 

(a) dependent and independent events, 

(b) mutually exclusive events, 

(c) itienrem of addition, and 

fd) multiplication theorem. 

2. Wbat ii a normal curve ? What air it* properties ? 

3. Explain the importance of normal curve in statistics and it* applica¬ 
tion to economic* (M. A> Cktki w /p$8) 

4. Give the condition! which a vaiiatr should lattify *o a* 10 have it* 
distribution at normal distribution. 

5 In a certain town ihe ratio of main to female* is 1,000 : 987. If 

this tendency i*expected to continue, what is the chance that a newto 
horn baby is male ? |H.IJ 

6 What is the probability that a digit selected at random from the 

logarithmic table itii» I, and (ii; 5 or 7 ? (11,2J 

7 The following table gis ts a distribution of wage* : 

Weekly wages 30- TV 40- 45- 50- 55- 60- 65-70 

No of wage earner* ‘I \W 230 112 30 16 7 

An individual t» taken at random from the above group Find 
the probability ;i, his wage was under 40, it) hi* wage was 55 or 

over iii hi* wage was either between 45-50 or !I5~40 

{It Com , jDWAi, 1^54/(11.4] 

« A bag contain* 1 red balls, 12 white ball* and 4 green balls. 

What is the < haute that a hall drawn is white ? J H.6j 

ft Two cubical dice whose face* are marked with digit* 1 to ft are 

throw'll simultaneously- f ind the probability that the sum of the 
digit* an the face* that turn up it 8. > M.A , Dtlht, iy$0) (11*8) 

10. Compare the chance of throwing 5 with one die, 10 with two 

dice. (11.9) 

M- Find the chance of throwing 4 or 5 at least once in 5 throws with a 
singlr die. (11.21} 

12. Goddard, the captain r?f the Wr.it Indici team, it reported to have 
obamed the rule of calling "head* every trine the toss w as made 
during the five maubrt of the last test series with the Indian tram, 
what is the probability of hi* winning; the lot* in all the five 
matches * 

13 It is 8 ; 5 against a person who is now 40 year* old living till he h 
70, and 1 ■ 3 against a person now 50 living till tie i* 80. Find the 
probability (hat one at least of these person* will be alive 90 years 
hence. (B.A., f&dWalad) (11.14) 

14. In teasing of a coin, hod the chance of throwing (if heads in three 
successive trials, <li] two heads and one tail in 3 trial*. 

fUW) 

15. Find out the probability of forming 563 and 188 with the digits 

1, 2. 3, 4, 5, 6, 7, 8 and 9 when only number* of three digit* are 
formed and (a) when repetitions are not allowed, (b) when repeti¬ 
tions are allowed, (11,5] 



34® AM INTRODUCTION TO STATISTICAL METHODS 


16 . 

17. 


in 






so. 


n. 

tx 




During the war one ship if* ten was sunk on the average m making 
a tertain voyage. What was the piobability that at least three out 
ni a ctmvoy uf tiff *hip* would arrive wfely t 

.tt.Cum , Otlhi, trjSS? 111 22 [ 
The following mortality table shows the number of survive™ to 
various ages of 100,000 newly born male* ; 

Age 

o iu m :so m w 60 70 m 90 \ 

Sumveri 

100000 TifiOl TO wm MMt) 00521 67767 41,7 W !<$f>C 2612 6 

Pimj the probability of a newly bom infant in thin population 
living in Ik HO years obi. Hmi probability of a 20-ycars old in Oil* 
popuLtum living until he in *A) years old. 

H.Ctm , Dnlhi t*0>: j 1173] 

Three hows of tbr *amr tyftr were advert wed to hr let in a 
locality, three nw*»i made sepai a»r applications tor A bouw, What t> 
the probability ?* that all 4 men made application* for the tame 
house, di) that rath t>l the three applied for a different house, ami ni, 
that 2 of them applied for the name house ami the third lor one of the 
other home* 7 

i.BXW, ( Mhi t W5i (1U9J 

A tan hit a target J iiinn in 5 slioi*. B 2 timet m 5 itiou, C- .7 
time* ui ♦ shot*. They lire +i volley* What is the probability that 2 
shots hit ? 

i.WJ, P*vaX !<rt 5 ! [11.13] 

ball* are tented by dropping horn a height and measuring the height 
of bounce. A ball tf Tast’ if it fires above J5 inches. The average 
height of bounce wti tl inches The standard deviation waa 15 
inches* What is the chance of getting a fast ball 7 

[12 I9j 

hacks of gram packed by automatic machine luadci have an average 
weight of 100 seen, In standard deviation U 0 7*0 seers- Find the 
i hatuf of getting a hag over 101 5> seen and lidow 94F3 iccn. If 
a dealer reject* a bag behm seers, how many bag* do you expect 
be rejected in a lot of l,000 bag* ? (12720] 

H two normal universe* A and B have the name total frequency but 
the standard deviation of universe A is K times that a universe 8, 
show that the maximum frequency of universe A is l/K that of uni- 
verse H jAAi., ty+f) [12.28] 

Draw on t hr same graph the normal cuivYs 


IV 


v'i* 


. H* . 


when il> p«'*0* W; (tij ji 4< 0, o-*2 : {lii| I ; and 

(iv) p »*■ 2, •'■*'2. 

You are not expected to plot the points (ruin tabic. 

v ALCow , OHhi, iy$4' (12 35] 

Jdiow tint mean of a bfnomkal distribution is *p and standard 
deviation is 


(Af.Ctai., Mk*t t&$y) ili.lj 



BINOMIAL NORMAL AND POISSON DISTRIBUTIONS 349 


2.5, For ft binomial dirtnlniuoti mean fc b and ttandard deviation i« 
V 2. Writf til iht term* of the distribution. f 12,2] 

26. The nor mat late of infection of a certain diMMw in animal* is known 
to br K>%. la an experiment with 6 animals injected with a new 
vaccine it was obeetved that none of the animab caught infection, 
Calculate the probability of the observed retult. 

{i A S , W >) i«.+| 

27. 7 coins at e Unwed and the number oi heads noted Tht* experiment 
is repeated 12ft times and the following is the observed frequency 
dintrihution of the 12ft throw* according to the number of head* : 

No. of heads 0 12 3 4 b 6 7 Total 

Throws 7 6 19 35 3d 2:i ? l 12ft 

Fit a binomial distribution under the hypothesis that the coin* 
are unbiased. 

Find thr mode of the binomial distribution. 

• M. Cam., Mhi t 19 &) (12 9j 



Chapter 15 

Sampling and Statistical Inference 


Infinite and Finite PofmUtiMii 

S tatistical population* may b« infinite or finite. Tossing 
Up of coins it an instance of infinite population. Complete 
enumeration for census study) is an impossibility in such a case 
at the tossing of the coins hat no end even if every human being 
emitting on earth goes on tossing continuously throughout one's 
life and this process will not exhaust even if the present genera* 
linn is joined by future generations, if any 

Contents of fqincs, distribution of left-handed persons, etc., 
may be cited as examples of finite populations. But here again 
in very many case* complete enumeration may not he feasible 
because of vastne&s of such finite populations A census of the 
contents of the mines would imply complete exhaustion of the 
mines which no sane Government can even thinfc of, though in 
theory it is not at all impossible, 

Sampling 

In such cases the best that an investigator can do is to 
examine a relatively small number of cases (called sample), 
from out of the Whole population with the expectation that it 
will throw light on the characteristics of the universe* The 
study of this relatively small number of cases comes under the 
theory of sampling 

One of the basic requirements of the sampling technique 
is the securrnf Of llttthplf which iftrtllV the 

universe of inquiry A representative sample is obtained bv 
the method of random selection wherein each item orthc 
universe has an equal chance or probability of being selected, 






BAMPLmC AHO STATISTICAL INmiNCi 351 

and hence no item is included or winded because of the 
presence jjlinas. For ah the items to enjoy equal chance of 
being selected it U absolutely essential that these items should 
be perfectly uniform in shape, weight and size, and they be well 
mixed in a container and then a blindfolded draw be made. 

An important requirement in taking a statistical sample is 
that it be TprobabilUy sanyptk^ This means that the proba* 
hibTyT^bemg selected in the sample must be known for each 
item in the universe. This is achieved through the application 
of the mathematical theory of probability. 

Actually the mathematical theory of probability or random 
sampling is based upon the idea of sampling from an infinite 
population, as in infinite population the probability of next 
item being selected (sampled) remains unaffected by the inclu¬ 
sion of the prior item This element of independence can also 
be introduced in finite population by replacing each item 
sampled before taking the next one in the sample. However, 
if the finite population is considerably larger than the size of 
the sample to be taken, the entire sample may betaken without 
actually replacing the items sampled, with only negligible 
difference with the theoretical expectation. 

Objects of Sampling 

Besides the attributes of adaptability, speed and economy 
which necessitate the study of sampling, the following arc the 
two basic objects of sampling : 

(1) to get a picture of the universe as dearly and precisely 
as possible, and 

(2) to determine the reliability of the estimates. 

These objects can Ik obtained only when the selected 
samples fully represent the universe and the size of the sample 
be relatively large. 

Sampling Errors and Differences between repute lion 
■and Sample Measures 

h ha* already been mentioned that even m the ease of coin* 
tossing or dice-throwing which belong »o infinite populations, 



352 AW INTRODUCTION TO STATISTICAL METHODS 


('hunzt plays an important role because of the presence of 
numerous independent causal force#. Consequently if a second 
throw of a die is compared with its first throw, there is no 
guarantee that the results of the two throws will exactly be the 
same. Similarly we cannot expect two bridge hands to show 
the same distribution of cards in the various suit*. Exactly 
for tb% same reason we cannot expect any two samples drawn 
from the population to give identically the same result. Chance 
practically guarantees that any two or more successive samples 
from the same population will show some differences in their 
statistical measures such as mean, standard deviation, etc. (In 
view' of the indispemahility of the comparison between the 
population and sample measurements for the determination of 
the reliability of the estimates,, two terms 'parameters* and 
‘statistics’ are generally used for the differencial ion of popula¬ 
tion measures and their estimates usually taken from samples. 
Statistical measures, such as means, standard deviation#* etc., 
of the populations are railed parameters and their correspond- 
tag estimate* calculated from the samples are known as statis¬ 
tic*.) The differences between them are technically called 
sampling errors Such errors do not refer to errors in compu¬ 
lation, or in measurements, or even to bias in the sampling 
design These are attributable exclusively to sampling, i.e (( 
errors due to the study of a small portion of the whole universe, 
friterr ‘chance* is rather the sole factor while human factor is 
/mainly responsible for nom&ampiing errors. 

K It mean* that the greater the sampling error the more will 
l*e the deviation or difference in the sample measures and conse¬ 
quently thr leaser the sample will be agreeing with the universe. 
Conversely, the lesser this er ror the mure the sample will Ire re¬ 
presentative of the universe. Thus in order to make the sample 
irpresemativr, sampling error should he minimized This can 
be done by enlarging the size of the sample. The larger the si/e 
of the sample, the greater will he the accuracy of its result** for 
the-accuracy of a sample increase* as the. square-root of its size, 
increases, he. , if the accuracy of a sample of a given she is to 
he doubled* the sire of the sample should be quadrupled* 



SAMPLING AN STATISTICAL INFERENCE 353 

Owing to the pretence of sampling errors (also called Bm> 
tnations of sampling) seldom shall we find perfect coincidence 
between the sample statistics and the population parameters. 
Yet the difference or deviation between these two is found to 
he within limits, as the sampling errors are found to follow 
chance laws or laws of statistical regularity. These limits are 
assigned with the help of standard error. It (standard error) 
provides a measure for ...the.. range wliHm which 

sample statistic might deviate from the population parameter 
due 16 'sampling error. In other words, it gives the most prob- 
aWf maximum and minimum limits within which the para¬ 
meter'Ties,' 

Null Hypothesis and Level of Significance 

Urea use of the assignment of the limits, various tests {an 
important test being based on standard error) have been evol¬ 
ved to find the significance of the difference between the results 
ot expectation (population) and observation (sample). If the 
difference between the two is expected to arise became of 
chance alone (sampling) then the difference is said to be insig¬ 
nificant, otherwise considered significant, Thus if *he difference 
could seldom (say, once in 100 or 1,000 trials) be as large owing 
to sampling errors alone then the difference is said to be real 
in the sense that it is not likely to lv due to sampling errors 
alone, and the observed or sample results are said »o be signi¬ 
ficant. On the other hand, if it is found that often (say, ^ 
times in 100, or even once in 5) results as large could tie obtained 
that would he attributable to sampling error* alone, they are 
said to be insign i tic ant. 

Before arriving at the conclusion that the difference is 
significant* a null hypothesis is set up, and it is assumed that 
there is absence of non-sampling errors (bias, etc..} and the 
difference is due to chance alone. Then the probability of 
the occurrence cf such « difference’ is determined If it is 
found that such a difference could arise only 3 times in 100 
such experiments, then the difference is said to be significant 

as 




354 ah iimiomjcnoTf to gTAnwicAt methods 

at the *03 level (or the level of significance it said to be 5%). 
Similarly if the difference could arise by chance only once in 
100 such experiments, the difference will be said to be sigm* 
Scant at 0 01 level or 1% level of significance. In case a 
difference is expected to arise by chance, say, only 3 times in 
1,000 eperimems, i.e., at 003 level, then the occurrence of 
such a difference owing to chance will he highly improbable 
and the inevitable conclusion will be that the null hypothesis 
» most likely false. 

It should be noted that the occurrence of a highly improb¬ 
able difference (or of a very unlikely event) does not dispmw 
the null hypothesis since such improbable events to Happen, 
though only seldom. Again the probability of the occurrence 
of the observational difference in 5 or more trials out of 100 
also does not pm* the hypothesis. All that can he said in this 
case is that the experimented or observed difference gives no 
reason to doubt the hypothesis. 

Standard Error and Sampling Distribution 

The term standard error ha* l>een mentioned above rn the 
discussion of ‘significance 4 . Let us consider as to what is 
precisely meant by it (S R ). Standard error of any measure 
of a sample can be stated ajthe standard deviation of sampling 
distribution of that measure. It may be illustrated with refer¬ 
ence to the Standard Error of the Mean. 

Sampling distribution of the mean of a sample is obtained 
by the frequency distribution of the means of all possible (very 
large number of) random samples of the same size obtainable 
from the given population. It has already been indicated that 
if sample after sample is taken from the same population, the 
mean, etc., of each of these samples will usually differ. If, say, 
100 samples ate randomly selected there w ill be 100 means of 
such selected samples which will usually differ from each other. 
If a series nr frequency distribution of these 100 sample means 
it found, it will be called 'sampling distribution of the mean*. 
The form of the sample distribution of the mean of a fairly 



SAMPLE AND STATISTICAL INFERENCE 355 


large random sample selected from normal population will be 
that of the normal distribution . It has been found that this 
distribution will also closely approximate to normal distribution 
even if the original population, from which random samples 
have been selected, is not normally distributed, unless, of 
course, the departure from the normality in original population 
is extreme and the size of the sample is small 

It can be proved mathematically that the mean of Owe 
population and the mean of the sampling distribution are the 
same. We know Oust the standard deviation measures the 
dispersion about the mean. Here the standard deviation of 
-sampling distribution* measures the dispersion of the means of 
all samples of the given size Af about the mean of the universe 
(also the mean of sampling distribution). In other words, it 
measures the average sampling error involved in estimating 
the mean of the universe by the mean of a sample of size .Y. 
This is because the difference between any sample mean and 
population mean for mean of the sampling distribution) is the 
sampling error of that sample mean. T he sum of the sampling 
errors of the means of all samples of size.Y will be zero, because 
the errors will deviate on an average equally on both the sides 
of the mean The sum of the squares of all the sampling errors 
divided by their number is the sampling variance and the 
ffquare~root of it gives standard error. T hus standard error 
(which has already been stated above as the S D. of the sampling 
distribution of the means of the samples) can be described a* 
the quadratic mean of the sampling errois of the arithmetic 
means of all samples of size A' 

LARGE SAMPLES 

Tkeotttti 

The above explanations of the sampling distribution are 
based upon the following mathematically proved theorems : 

/ Thtmtm 

If random samples of size A' are taken from a normal distri¬ 
bution which has arithmetic mean * and standard deviation a. 



356 Aft INTRODUCTION TO STATISTICAL METHODS 


the »ampling distribution of the arithmetic meant of the 
samples of size be a normal distribution with arithmetic 

a 

mean a and standard deviation . 

V A 

U mourn (Tht Central Limit Theorem) 

If random samples of size .V are taken from a universe 
(even if not normally distributed) which has arithmetic mean a 
and standard deviation a, and if JV is huge (30 or more), the 
sampling distribution of the arithmetic mean of samples of 
size „V will very closely approximate a noimal distribution with 

n 

arithmetic mean w arid standard deviation 

\ A 


Standard Error of the Mean 


Under the conditions of either of the above two theorems, 
the standard error of the mean (of a sample which is represen¬ 


ted bv the tvmbol is . 

* A V A 


where <i r is populations.!.). 


It may be noted that the larger the sample size V, the smaller 
will be *r v (standard error of mean), for it has already been 

staled that the dispersion of the sampling distribution of the 
mean varies inversely to the square coot of the sample size. 
Further, it is also possible to make the S.K. as small as one 
wishes by enlarging the sample size A* The reason for this is 
obvious, ivC , measures sampling error and this error, as 

distinguished from oilier errors called non-sampling errors 
arising from bias etc., is completely under the control of the 
investigator planning the procedure of the sampling. 


Interpretation of Standard Error 

For the interpretation of the ST., one has to look to the 
area property of the normal curve. We know that the area 
lying between ± one standard deviation from the mean in 
normal curve is equal to 68 27% of the whole area. Similarly, 




SAMPLING AND STATISTICAL INFERENCE 35? 

± 2 and ±, 3 standard deviations cover 95 45% and 99*73% 
respectively of the total area* In other words, ± 3 from the 
mean docs not cover 27% or roughly 3 items out of 1,000 cates, 
Le,, practically all the items are covered within the range of 
± 3e. Again if we take ± l‘96tf and 2 5758o; the respective 
areas covered will be 95% and 99% respectively, or 5% and 
1% respectively of the total area will be left uncovered. 

As standard error is the standard deviation of the sampling 
distribution of the mean and as this distribution resembles that 
of the normal curve, the area lying between £ one S.E. from 
the mean w ill also cover 68*27% of the area (samples) from the 
universe. Exactly the same will be the coverage of area 
(samples) far the other values of standard errors as that of the 
corresponding standard deviation just stated. 

These values of standard errors are called critical values. 
Thus if mean ± i'96 S.E. covers the area of 95%, which 
means that 5% area is left uncovered, then the above value of 
196 is called critical value at 5% level of significance. It may 
be slated here that if the figures of critical value and S.E. arc 
known, the sampling error for the given level of significance 
can be found as sampling error for a given level is equal to the 
product of S.L. and critical value. 

The signs of plus ( f ) and minus ' — j determine the range 
and also the lower and upper limits, called confidence hmtu. 
The interval between the confidence limits is known as confi¬ 
dence interval. 

Statistical Inference 

So far we have presumed that population parameters, such 
as mean, standard deviation etc., arc known. But in actual 
practice these are often unknown. Consequently the problem 
which is usually faced is not to set the limits within which 
sample statistics are expected to deviate from the population 
parameters, but to assign the limits within which population 
parameters are expected to lie from the sample statistics. 
This latter problem involves the problem of sktiistual tndmtim 



358 AN INTRODUCTION TO STATISTICAL METHODS 

wherein the derived inference is based on empirical (expert* 
menial or observed *atnpk statistics) evidence, as against the 
didmim inference derived from assumptions or premises which 
cannot be false if the premises are true. In other words, the 
logical process by which the generalisations are arrived at 
from a study of particular cases is termed induction as opposed 
to deduction in which specialised conclusions arc drawn from 
general propositions. By statistical induction or statistical 
inference is meant the generalization of statistical results (he., 
sample statistics), the application to a universe or population 
of measurements derived front a sample, Thus, in practice, 
the real problem is to set confidence limits for the unknown 
imputation parameters* i.eassigning limits with the help of 
t sample statistics, which will contain population parameters 
| in a certain per cent of all cases in the long run. 

Formulae to find standard error for various sample statistics 
when size of sample is large : 

(I; Standard Error of Means or 

(a) When standard deviation of the population is 


known, cr 


. « T . 

VvV 


w-hr.re r > is the standard deviation of the population and .Vis 
the number of items in the sample, 

(b) When n, is unknown, i.e,, only standard deviation 

v (sample) 


of the sample is known, e 


s/.V.1 


But d .V is sufficiently large, then whether we apply 
\/.V or \/A r the result remains almost unaffected. So even 
when n 9 is unknown, i.e., r. (sample) is only known, we can 


divide it by , e 


But 'it should 


« (po p, or sa mple) 

always be rememl>errd that if standard deviations of both 
populations and sample are available only that of population 
be used in the calculations of standard error for various sample 
statistics. 



SAMPLING AND STATISTICAL INPEHENCK 35*9 


(2) S E, of Median ore, -125331 


[3) S.E of Quart ilcs or <r, 1 '3b263 


(4 1 S E. of Mean Deviation or o, t . -602B , 


(S; S.l of fjuianilc Deviation or v* * 78672 “ 

\ v 


tbt SL of Standard Deviation or a «« 

0 v 


•Vi 


(7) S l, of Variance or n o : 


S E* of Coefficient of Skewness or n 


.. 


/) S 1. of Coefficient of Variation ot a* 


v ■ • i' 


\10 S E of Coefftr lent of Correlation or re, , 

\ ’ 


(11) S I of Coefficient of Regression or h\\ 


* s # 


1 12 f SL of Regression Estimate 

S E of Estimate of x ot t* 

S E. of Estimate of y or r, 9i fl 

(11) HE of Coefficient of Association or <? f * 

- J al H l ' j. . l 

2 V <1ST‘ (i*) t«*/ 



360 AN INTRODUCTION TO STATISTICAL METHODS 


The above formulae of standard error of a sample statistics 
will give two limits within which corresponding population 
parameters are expected to occur* Further if parameters are 
know*., the respective standard error will give, us the expected 
range of difference between the parameters and statistics and 
thus these limits will abo give a basis for treating the actual 
difference as insignificant (owing to chance error's) or significant 
(not owing of sampling or chance errors but to no resampling 
errors and so the conclusion will be that sample results signifi¬ 
cant !v differ J. 

When two sample statistics arc given, then one has to find 
out whether difference between the two is insignificant* i.e.* the 
two sample statistics belong to the same set (universe) and tire 
difference is due to chance, or the difference is significant, be*, 
either the two samples come from different universe or the 
selection of the sample has not been in Accordance with the 
random sampling method. Here again to find significance of 
the difference between two statistics, we have to calculate S.fc. 
as mentioned below , 


(l) V E, t>f Dtfftrtnee btktetn tuo Sample A (earn 

{ a } f or unrelated series (samples not from such universes 
which may be cor related when, 


i; <*., is unknown c 


V *, " ih 


» d j i * M is known c 


VI,- 


l 


'i A i V «» 

ini■ ihe sample mean from sample of size a, is com¬ 
pared with combined mean uf two samples 




.. ^ / *t 

" r v iTtr-T' 

b For related series samples from *u* h universes which 

/5 { . 5 fr. >' Gl 

art Correia ted) ■■ 2r *•*—* 

V « f Kj IJjXft, 




SAMPLING ANt* STATISTICAL 1NPEKKNCE 361 


(2) S E. of the Different* of Trn Sample Medians 
<yj», — m 2 ^ a 2 mj 4* 

<3) S,£. <?/ the Difference of Two Standard Deviations 


^i) when a p i* known 



) 


when o p is unknown 


Now some of the following pages will he devoted to illustrate 
the application of some of the important formulae mentioned 

above. 

UlustuiUvm <in Standout Error of the Mean : 

(l) Suppose the standard deviation of the heights of uni¬ 
versity male students is 3". One hundred male student* of a 
university t irr measured and their mean height is found to be 
CkT. Ascertain if this (sample) mean height represents a *igni- 
beam divergent.e fiom the population mean of 70" 

Solution . 

Standard Error of Mean 


3 _ 4 
v'ioo jo 

* 0 - 3 ' 

The actual difference between the population and sample 
means is 70" OB'- 2*. Thu* difference i* about b'7 time* the 
above computed standard error, viz., 0‘3*. Hence there is 
very little possibility of this difference of 2* having been arisen 
due to sampling fluctuation*. Consequently the divergence 
between the two means is significant. 

(2) If the standard deviation of pulse rate in adults of the 
age group of 2025 is 09, and the normal pulse rate is 69* 
would you conclude a significant difference at 2% confidence 
interval if a group of 81 people of the same age group suffering 
from a disease were found to have a pulse rate of 74 ? 





362 an tmnoMJcriaft to statistical methods 

Solution: 

Standard Error of the Mean 


99 

v'Bl 

-11 

At 2% confidence interval, the area of the normal curve is 
#8% and this area is covered by X 2'32 S.E. Consequently 
we have to find the value of 2 32 standard error to conclude the 
difference between the two means at 2’ ,, confidence interval is 
significant or otherwise. 

The value of 2'32 S.E.« H x2’32 2*552/ 

The actual difference is 74 €>‘J, or 5 which is much more 
than 2 552 Hence it c an be safely concluded that the differ¬ 
ence between the normal arid sample pulse rates at 2% confi¬ 
dence interval is significant. 

(3) Find out the standard error of the mean of a sample of 
the follow ing 400 adults males comprising of 4 samples relating 
to heights in inches : 

Heights in inches for adult males of 


Jammu 

Kashmir 

Himachal 
Era dealt 

Punjab 

M$m 66 62 

67*31 

6778 

6885 

Standard Dr nation 2 35 

2’56 

2*17 

2 5 


The number of the male adults measured in the above 
mentioned states are l(X) each 


Svlutim : 

Here we have to find the standard error of the combined 
mean. Consequently we have to compute first the combined 
mean and also the combined standard deviation, ami then 
tf> find out the standard error of this combined mean. 
p .¥* ** represent the combined mean of Jammu, Kashmir, 




AND STATIST I CAL INFERENCE 363 


Himachal Pradesh and Punjab and similarly <r UM their com* 
bined standard deviation. 

<vt 4" 4’ 4- 

w n^ntfn r tn 4 

where y,, X lf JV, y* and n A , n 3 . n a , « 4 represent the individual 
mean and the size of the sample of adults from Jammu, 
Kashmir, Himachal Pradesh and Punjab respectively* 
Substituting the values, we get 

Y <66 62* 100; * (67*31 * 100) + (67*78 y 100 t (68'5»* 100) 
Xm4 100 f I<K)4 1004 100 

. ,»•«« 67* 

400 

/ *4 «i(yi.ws •HMi’-wyi ~ ym*) 1 }+ 

n / f« 4 ^*(ya■-y.n l » | }4'a|ff4 s, +a4<y•--■yt**4) , ^ 

4 / ..—....:...~. . ... 

V », t »* t-fijtii, 

where s t , a v e,, <r t1 represent the standard deviations of heights 
ol adults from Jammu, Kashmir, Himachal Pradesh and 
Punjab respec \ \ vt Iy. 

Substituting the values, we get 

rTTwy^SHTootSSI^67 56,'»j 4- f 100 x 2'56 r +“ 
i 100 67 3» -67'56) 1 } *- {100 x 2 17* * HH>i67‘78~ 

/ 67 ■ 56)*}+{100 y 2 5* -j 1 (M)(68• 55 - 67 • 56J *} 

yj " “ “ ioS+ImJT-iooTTW 


i J55 2 23 t 88-36; * 165.V36 -j 6 25;-f{470 88+4 84}-f 

'_ *625 + 98-01} _ 

yj * ~ 400 ” . 

V^JSUv 625 

* 4(>0 


Standard Error nt the Combined Mean «r 

^iyt4 ■ 2‘5 2'S 

* vy v'ioo 




364 ait mmomrerioN to statistical methods 


(4) The data concerning height measurement for a random 
sample of individual from a given population arc as follows : 

Mean is equal to 172 
5. D. is equal to 12 
A is equal to 65 

If a Urge number of sample of the same size were selected 
at random from the given population, what would be the limits 
within which the true mean almost certainly lies ? 

Solution . 

Standard Error of the Mean Height of the sample or 

o 12 

tj »: ™- r r»-«« 15 approx. 

* V A v 65 

We know that A' ± 3 S.F.. covers 99 73% of the total area 
or case# and thus leaving only 0 27% of the cases, therefore, 
if this value of 3 S.E. is found out and the minimum and 
maximum limits are assigned, we can confidently conclude 
that within these limit the true mean (of the given population) 
almost certainly lies. 

Jt±JSJU-l 721 3(KV; 

-172.h 4 5 
—167-5 to 176 5 

Thus the required limits w ill be 167*5 to 176*5 

(5) if it costs a rupee to draw one member of a sample, 
how much would it cost, in sampling from a universe with 
mean 100 and standard deviation 10, to take sulficient member# 
to ensure that the mean of the sample in all probability would 
be within 0 01 per cent of the true value ? Also find the addi¬ 
tional cost to double the precision. 

Selttlhn : 

We know that mean 3 standard error covers 99*73% 
(or leaves *27%j cases of the total area or cases, which, in other 
words, amounts to overall coverages in all probability. 

Standard Error of the Mean of a Sample or e «*■ r* 

* v A 



SAMPLING AND STATISTICAL INFERENCE 365 - 


In nil probability the difference between sample value and 

(f 

population mean should be 3 times of S E or % ——and the 

Vtf 

given value of it is 001% of mean (100 given), be., 0,01 or 

-■ 0-01 or 3 -~L* • 0 01 

%/.v v'.v 


or 30 - 0 01 v'.V or 3,000 — v /.V 
or .V=»9,CKM),000 

So the number of sufficient members to ensure that the mean 
of the sample in all probability be written 01% of the true 
value is 9,000,000 and consequently the total cost will be 
Ri 90 lakhs. 

To double the precision means to halve the standard error. 
In onier to halve the standard error or double the accuracy, 
the number of the sample should be fourfold, ic., it should be 
36,000.000. But in the question, additional cost is being asked, 
which will be R*„ 36,000,000- R*< 9,000,000 Rs, 27,000.000. 

?6) The srandard deviation of a Urge number, .V, of obser¬ 
vations on chest measurement is 0*5'. If random samples of 
400 individuals are drawn front the “Population of JV indivi¬ 
duals, what is the probability that the sample mean will differ 
from the parameter mean by 0*05" or more ? 

Solution : 


Standard error of mean or n **“.%■ 

\ ‘ A 

0 5 0 5 

20 

The difference {between sample mean and parameter 
mean) is 2 times that of the standard error of mean, via., 


**0.025 


05 

■025 


’!■». 2 . 


In other words, the difference of 05 is equal to 


2 SR. which covers 95*45% of the total area (in both direct inns 
from the mean) of the normally distributed population* and 
leaves only 4*55% area out of i 2 S,E# from the mean. 




366 ah mrmmfmton to statiiitical methods 

Consequently the probability that the sample mean will 
liflfer from the parameter mean by 0*05 or more in the above 
:a*e It *0455 or 1/22 appro*. 

Illustration tm Standard Error of Standard Deviation : 

1,800 persons of a certain age group were observed to have 
a standard deviation of 9 2 beat* per minute. Assign the 
limits for the »tandard deviation of the population, assuming 
the above sample of 1,800 person* came from a normally 
distributed universe. 

Solution : 

Standard Error of Standard Deviation 


or o 


vW 


Substituting the value*, we get 
92 


9*2 


* "■ . 60 


^0 153 


A» thrice the stand art! error cover* almost the total number 
of Ca*e* (to be exact, 99*73% case*), so the population standard 
deviation should not differ by more than - 3 S.R. or 9 2 4 
3 (133) 

9*2 - 3 (* 153) 9*2 -459-8*741 
9 2 t 3/153)« 9*2 f *459» 9*659 

Hence the limits of the population standard deviation are 
8*741 -9*659, i,e., between these (two) minimum and maximum 
values parameter standard should lie. 


illustration «a Standard Error of Median, Quarlitir eti. : 

Compute the standard errors of Median, Quartilrs, Mean 
Deviation, Variance and Quartile Deviation if standard devi¬ 
ation is 10 and number of cases included in a sample is 2,500 f 
presuming that the sample has been drawn form a normal 
population. 

Solution s 


S.B. of Medan - » 2531! ~ 

\'A 



SAMPLING AND STATISTICAL INFERENCE 367 


-1*25331 x 


v 2,500 


1-25331 x 


**•25066 


S.E. of Quartilrs (both l*t and 3rd) 

S.E. q~ 1*36263*”;™ 
v> 


1 -36263 X 


V' 2,500 


1 *36203 X v 


*27253 

S.E. of Mean Deviation 


S.E. m,d. 0028 - 
X -V 


10 

W)2R vw 

«-6028x - 


S K. of Variance -» 

* V 


'«•"> *Vd» 

‘‘“WrE 


— 1 (K) x v 
=* I 00 x- 02 R 
*2‘fi 




368 Alt Iimtooucnolt TO STATISTICAL METHODS 


S.E. of Quartile Deviation 
S.E. 78672 


- 78672 x 

V 


10 

2,500 


- 78672 x ! 

5 

- 15734 

ItlwtrGtiom <m Standard Error of Corffident of Correlation 

(I) A sample study of 2.500 couple* given a eorrrlatioi. 
coefficient of 0 45. Estimate the limit* to the < on elation in 
the universe. 

Solution : 

The $ E. of the correlation coefficient is 


where r is the correlation coefficient,* 
Substituting the values, we get 


S.E. r v. 


1 .(O'15) 2 

JTBtir 


I 0 2025 0-7975 

_ w . . 


— 01595 or *016 

In all probability, the parameter coefficient of correlation 
should not differ by more than thrice the S I. r from sample 
correlation, coefficient, as sample r±. 3 S.E. r would cover 
99*73% of the total population. So the limits to the coefficient 
correlation are 

0-45—3(*016) «<1‘4S~ '048- O'402 
0-45 f 3( 016} -0 45 c 040*^0 498 


* li tlsmlii br noted rh»t S F*. f} 4):'\'iV thoukl be h*k| enly wheti 
t it moderate, **y, lew than 5 and % a Urge. cither wise i-trai of ihe 
wgmfi'camr * should hr uteri. 



SAMT14HC AW I) STATISTICAI, INFERENCE S69 


Thus we can confidently ewpect that the parameter or 
population correlation coefficient should hr within the limits 
t of 0 402 0 49$, 

(2) A correlation coefficient of 0 2 is obtained from a 
random sample of I,b00 pairs of observation! Do you think 
this value of torrelation coefficient is significant ? 

Solution 

To conclude whether (he value or 2 is significant, if , 
whether the nlnrtvrd pairs are realh correlated, it is necessary 
to find nut the value of t which mav arise on account of 
c\ attcr when 1 h( 0 pairs are observed presuming that the ob¬ 
served pairs are uncorrelated 

(>u the h\ prithni* tl at the pairs are wicorrrUfed, viz , r * D, 

si 1 fl J " 1 

\ \ \ I ,(.00 40 

o m 

\\r know lint T S f i cover 09 7^% < rises therefore, the 
upper limit of r will be 3 07 r n 07 S on account of sampling 
fhi< tuitions But the value of the observed r is 2 which is 
mans times this value so we can safelv cone I tide that the value 
nt r, m/ , 2 is highlv siemhcant, if, the observed pair* are 
realb com elated 

Illustrations on \landau! bitoe of the DifFerense beiu ten the Means oj 
tu a samples 

l) The mean pioduce of wheat of a sample of 100 fields 
comes to 200 lbs per acre with a standard ritviafion of 10 lb* 
Another sample of ISO fields gives the mean at 220 lbs with 
A standard deviation of 12 lbs Assuming the standard devia¬ 
tion of the yield at 11 Ihs for the univnve find out if there 
is a significant difference between the mean yields of thr two 
samples (Agra, 

Solution 

As the standard deviation of the universe is given, so we 
will not presume that the samples come from different universe 
m spite of the fact that the standard deviation of each ©f the 
24 



370 AW 1 WTBOBtJCTIOW TO «TAtV«TlCAt METHODS 
samples is given separately. S.E. JT,—jr a or standard error of 
the difference between two mean samples! -f* L 

T n i *** 

where is the standard deviation of the universe ; 

», is the number of the first sample ; 
i»t is the number of the second tarn pit, 

Substituting the values* we get 


S.E. f, —f, ™| 

1 ' 

» 100 

rr 

+ J. 

T 150 

™«h 

4o w 

1 *42 


The actual difference. between the two sample mean* is 

220. 200*®20, which is -r~- or about H times the S,E. of the 

i 42 

difference of means. Hence leading us to conclude that this 
difference is very unlikely to arise on account of chance and so 
the difference is highly significant, 

(2) Intelligence tests of two groups of boys and girls give 
the following results, Examine if the difference is significant. 
Girls. Mean 84, Si). 10 : Number 121. 

Boys i Mean 81, S.D, 12 : Number 8b 

Sofalten : 

Presuming that the above two random samples come from 
different universes, the standard error of the difference between 
their means would be 

S.E. J r vV- ^' 

¥ »i n t 

where <r A and n, represent respectively the S. D, and number 
of girls, 

ami 9 t ami a, represent respectively iht S. 0 ami number 
of boys. 



UMtUm AH» 8TATUTICAL IKKKWCE 371 


Substituting the value*, we get 


S.R. * 


i-*»->/ 


~wr~B\ 


100 , 144 

l2i + TT 



^\/2 : 604 « 1*61 

The observed difference between the two means is 84—6) ** 


3, which is about 


1*9 times the S.E, 




3 

161 


«*1'9 


approx. Consequently there is little ground to doubt the 
hypothesis that the difference is due to sampling fluctuations 
and hence the difference is probably insignificant 

{%) A random sample of 600 villages was taken from dis¬ 
trict Kanpur and the average population per village was found 
to be 480 with a S.D. of 48. Another random sample of 400 
villages from the same district gave an average population of 
500 per village with a S.D. of 54. If the S.D. of the parameter 
mean of the District Kanpur is 10, find out if the mean of the 
first sample significantly differs from the combined mean of the 
two samples taken together. 

$ elution : 


Combined mean of 

600 x 480 4 400 x 500 
“* .6M + 4O0~. 


488,000 

- 488 . 

S.E. ay-JP* or Standard Error of the difference between the 
first sample mean and combined mean 

/ w ? <■- 

where is the S.D. of the parameter mean. 




372 AW INTRODUCTION TO STATISTICAL METHODS 


Substituting the value*, we get 

S.E. *« *, V t0 * 600(600 + 400) 


* V l00 Tffip55‘V bO 

V 0*067 -0 26 

The outerved difference between the first sample mean and 
the combined *ampie mean b 46ft 480 - 8. litis difference 

B 

is *25 m 41 bout 30 times of the standard error. Consequently 

the difference is definitely significant as the chance of this dif¬ 
ference is very very small. 

\ Small Sample* 

The methods of statistical analysis for testing the significance 
of samples statistics discussed so far were based on two assump¬ 
tion*, vie. : 

(!) Sample standard deviation is close to population 
standard deviation and as such can be used in its place for the 
computation of standard errors. Thus in the computation 
of the standard error of the Mean the standard deviation of 
the sample is used in thr absence of the standard deviation of 
the population. 

(ii) the distribution of sample statistics is Normal. Be¬ 
cause of this it is possible to assign limits within which the 
difference between sample statistics and population parameters 
is likely to lie. 

These assumptions do not hold good when the size of the 
sample is small (say less than 30) In fact for small values of 
h r (number of items included in the sample} the standard 
deviation of the sample is subject to a definite bias, tending to 
make it consistently lower than the standard deviation of the 
population. Thus if the standard deviation of a small sample, 
it used in the computation of the standard error of the Mean 
the mult will alio have a downward bias. It can, therefore, be 



SAMPLING AND STATISTICAL INFKH£HC«C 373 


laid that the methods, discussed so far, when applied with 
small samples, the sampling errors to which our estimates are 
subject, are consut cntlyj^ „ 

"TEST under-estimation^ of the ymp ling er ror tain aw ay a 
partoffis.iaafe.gcaMsaw^'--. ; This can 
BrSKSviTn m the following manner. If a represents the mean 
of the population, X the mean of a single sample drawn from 
this population, and e f the standard emir of X (the mean of 
the sample) we may say 

T _ x s j 

*r 

T thus indicates the deviation of the mean of sample from 4 
the mean of the population expressed in termj jrf the standard 
error of the sample mean. t ITc^ Is"""computed from the actual 

standard deviation of the population or from the actual disfri* 
button of sample means, T may be regarded as a normal deviate, 
and the significance of its value may l>e determined w ith refer¬ 
ence to a table of arras under normal curve. Thus if the value 
of 7 is more than 3 it indicates that the difference between X 
and ** is very significant But in actual practice wr do not know 
the g of the population, nor do we have the actual distribution 
of the sample means, and hence we use S (the standard deviation 
of the sample) as n (the standard deviation of the population) 
and thus our is in reality Even so the 7 obtained from 
these as proxi mat ions may be interpreted as a Normal deviate 
provided the 'S' is obtained from large samples. When, how* 
ever. f S* and $* are derived from small samples (giving them a 
downward bias) the T derived from them may not be interpreted 
as a normal deviate for the distribution of T departs significantly 
from nounalitly with small samples. It is, therefore, necessary 
that this fact is taken into account while making statistical 
inferences. 

Symbol T and *l f 

It has been pointed out earlier that the nature of *T* varies 



374 an introduction to statistical methods 


with the si*e of the «ample, With a view to denote this 
difference we u*e symbol 4 T' when it take* the form of 
normal distribution (as it does when approximations to it are 
based on Urge samples). When its distribution departs from 
normality (as it happens when approximations to it are based 
on small samples) we represent it by symbol *t\ Thus the 
difference of the observed and the actual values expressed in 
units of standard error in the cate of small samples is represented 
by 4 V. 

It may here lie noted that for the computation of the 
standard error of the sample statistics* the standard deviation 

* /'jjr 

of a small sample is computed by the formula w v -- 


instead of 



This it done with a view to eliminate 


the discrepancy which the mean of the standard deviation 
of alt possible small samples of similar sue has as compared to 
the standard deviation of the population. Thus the standard 
error of Mean in the case of small sample would be calculated 

as : 


and consequently 


/ r** 

V jV-~- i 

v'.v 


X - » 



,K~~i 


v / A' 



** 

. 

\ 


£2* JX~*) VW 

0 f 9 


it may here be noted that 





SAMPLING AND STATISTICAL INFERENCE 375 

"I” distribution 

It hu been stated earlier' that ***** distribution departs 
significantly from normality and as such proper correction 
should be made to these values for purposes of obtaining statisti¬ 
cal inferences. This correction factor may be reduced to 
definite terms as shown below : 

If 



t** — (here represents standard deviation) 

Y may be derived from T by the following method 
x St ^ x 
t“ST" * $x~ 

This means that —— (i.e. T) which is normally distribute 

OX 

$2 

ed value has been divided by factor —■— to obtain the value 

; <iX 

Y. The size of this correcting factor has been measured for 
all size of small samples, and on its basis tables have been 
constructed to show values of V corresponding to stated probabi¬ 
lities for samples of varying sizes It is with reference to these 
values given in the Y tabic that we interpret the calculated 
value of Y. Thus if the calculated value of V for a sample 
consisting of (say) 6 items is to be interpreted we have to com¬ 
pare it with table value for n 5 (five degrees of freedom at a 
certain level (say 5%) of significance If the calculated value 
is less than table value it can be said that the difference m the 
observed and actual values is due to sampling errors. 

Uses of the Y distribution 

In the following pages are given a few examples to illustrate 
the method of using Y (commonly known 'Student*) distribu¬ 
tion tor testing the significance of various results obtained from 



376 AW INTRODUCTION TO RTATIfTICAjC METHODS 

smstl samples. It should be noted that in snaking these tests 
h has been assumed that the distribution of the population is 
normal cur nearly so, 

MpMcsacc of a sample mean 

When we are interested in finding out whether the mean 
of a sample drawn from a normal population deviates signi¬ 
ficantly from a stated value (the hypothetical value) of the 
population mean, the following formula is used 

, . h ) \'N 

I - 

o 

where A" represents sample mean 

t i mean of the population 

S Standard devation of the sample 

N the number of items in the sample 

fllkutratum : 

Nine individuals are chosen at random from a population 
and their weights arc found to be, in pounds, 110, 115, 118, 
120, 122, 125, 128, 130, 139. In the light of these data, 
discuss (he suggestion that the mean weight of population is 
120 lbs. 

Solution : 

Calculation of the average weight and its standard devia¬ 
tion. 


No. of 
individual* 

Weight in 
11m. .V 

Deviation from 
the average 123 (*'■ 

a 

1 

no 

13 

169 

2 

115 

. 8 

64 

3 

118 

- 5 

23 

4 

120 

- 3 

9 

5 

122 

- 1 

1 

6 

125 

2 

4 

7 

128 

5 

25 

8 

130 

7 

49 

9 

139 

16 

256 




JfJT-1,107 







SAMPLING AND STATISTICAL INFERENCE $71 


Average weight of the sample or 



U07 

9 


**123 


Standard deviation of the 


sample, 

V 8 


-87 


Substituting these values in the formula 

, lX-») v“aT 

t*=z ■ 

(123 120 } \/ 9 

. HI . 

,„I03 

The number of degrees of freedom 9 -■ 1 -*8 
For 8 degrees of freedom at 3 level of signifirantc, the 
value of i - 2 31, the calculated value of i {I‘03) is less than 
the table value and hence wc can conclude that the mean 
weight in the universe is 120 lbs. 

Illustration : 


Ten individuals are chosen at random from a population 
and their heights are found to be, in inches, 63, 63, 66, 67, 68, 
69, 70, 70, 71 and 71, In the light of the data discuss the 
suggestion that the mean height in the population is 66 inches, 

(M. A., Ihlhx , 


Solution : 

The hypothesis to be tested is that the mean height in 
the population is 66 inches. 

Calculation of the average height and its standard devil* 
tion : 



378 AW INTRODUCTION TO STATISTICAL METHODS 


No. of 
individual 

Height 
in inches 

w 

Deviation from 
the mean 67*8 

W 

(**) 

l 

63 

- 4*8 

23*04 

2 

63 

-4 8 

23*04 

3 

66 

-1-8 

3*24 

4 

67 

-0-8 

*64 

5 

68 

0*2 

04 

6 

69 

1*2 

1*44 

7 

70 

2 2 

4 84 

8 

70 

2*2 

4*84 

9 

71 

3‘2 

10*24 

10 

71 

3*2 

1024 


LX** 678 

2,V 

-81*6 


v £X 678 ,, . 

jV 10 


-VS v 


81*6 - 

—<jf— **y/ 9 07-3 01 




d«grce« of freedom- 10— 1 * 9 

for 9 degree! of freedom at 5% level of significance the table 
value of 1^2*262. The calculated value of l is less than the 
table value hence our hypothesis holds goods, i,e., the mean 
height in the population is 66 inches, 

Tk “difference” Teat 

lUkitraiien : 

Eleven school boys were given a test in Geometry. They 
were given a month's tution and a second test was held at the 








SAMPLING Atm STATISTICAL INFEUKWCE 379 


end of it* Do the mark* give evidence that the undents have 
benefited by the extra coaching ? 

(M.A , Delhi, 1964) 


Boys 



Marks 


Marks 



lit Tctt 


Hod Test 

1 



23 


24 

2 



20 


19 

3 



19 


22 

4 



21 


IB 

5 



18 


20 

6 



20 


22 

7 



18 


20 

B 



17 


20 

9 



23 


23 

10 



16 


20 

n 



19 


17 

Solution ; 


Marks 

Marks 

Difference 

Deviation 

Boys 

1st Test 

2nd Test of marks 

from the mean (1) 




X 

(jt) 


1 

23 

24 

■fl 

0 

0 

2 

20 

19 

-1 

-2 

4 

3 

19 

22 

4 3 

2 

4 

4 

21 

18 

— 3 

. 4 

16 

5 

18 

20 

4-2 

1 

l 

6 

20 

22 

+2 

1 

1 

7 

18 

20 

4 2 

1 

l 

8 

17 

20 

+3 

2 

4 

9 

23 

23 

0 

-1 

1 

10 

16 

20 

+4 

3 

9 

11 

19 

17 

—2 

~3 

9 

X**U 



ZX-- 11 


XV-50 





380 AN INTRODUCTION TO STATISTICAL METHODS 


Mean of the difference, X*~ 


rx 

' fi¬ 


ll 

iT ; 



/ 50 

ii i ~ v * 

-.,.2*24 

X^et us take (he hypothesis that the students have not 
benefited by the extra coaching, i.e.. the mean of the difference 
between the marks of the two tests m zero (p 0). 

Applying f~ test 

I .Y •• ;t! v'-V A\'.V 

’*. S ... s"~ 


txjli 3 31 

2*24 2 24 

~ 1*48 

degrees of freedom - ' 1 i 1 10. 

For 10 degrees of freedom a! 5% level of significance, the 
(able value of J**2‘228. Hie calculated value of t is law than 
this table value. Hence our hypothesis holds good, he., the 
Students have not benefited bv the extra coaching as is evi¬ 
dent from the marks of the two tests. 


, lUrntraiim ; 


The following table shims the result of an experiment 
with 10 patients on the effects of two soporific drugs ,4 ami B 
in producing sleep. Test in efficacy of these two drugs as 
soporifics on the assumption that different random samples of 
patients were used to test different drugs 


Patient 

12 3 4 5 6 

7 8 

9 

10 

Drug A 

■7 1-6 --2-12-1 3*4 

37 8 

0 

20 

Drug B 

1*9 H It 1 -1 4 4 

5 5 16 

4 6 

36 



(B, Cem.y i 

Min, 

1959 ) 





0AMTLINC AW> STATISTICAL INttHKNCX Ml 


Solution : 





Difference 

KWiation 


Patient 

Drug A 

Drug B 

in effect 

from mean •' 1*6) 

*» 




w 

(*> 


1 

*7 

1*9 

1*2 

— '4 

oir 

2 

.H> 

'8 

2 4 

’8 

0*64 

3 

■2 

11 

13 

.... *3 

0-09 

4 

-1*2 


1*3 

■3 

0*09 

5 

- \ 

- i 

0 

-re 

2‘56 

f:i 

3 4 

4 4 

10 


0*36 

7 

3 7 

V5 

HI 

■2 

0 04 

a 

•R 

1*6 

OB 

-*# 

0*64 

9 

0 

4 ti 

4*0 

3 

900 

10 

20 

3*6 

Hi 

0 

0-00 

.V - to 



' rx-'ih ' 

. 

13 58" 

M 

ran of the difference'. 

v A- " 

>« ,,. 6 





.A 

10 



■ 9 V 

iV 

a i 

/ 13AB 

v <i 

J TS i 



.I *23 approx. 

Let u# take (hr hypothesis that I hr re is no difference in the 
effect* of drugs ,1 and /? in producing sleep. 



1 *0 
i : 2S 


*. sm 


4 6 
7*23 


3-7ft 


«lrgrrei of freedom 10 1 ■■■■■9 

I or 9 desires of freedom at I % level of significance, the table 
value of i - V2.V The calculated value of t i* greater than the 
table value. Hence our hypothesis stands rejected and we 
draw the conclusion that there it significant difference between 
the effects of the two drugs. 

Teat of difference between the means of two sample* 

When we want to test the significance of the difference 
between two sample means the value of I is calculated by 
applying the following formula : 


Jw« , 

sJI+ « 

y ”l ”* 


Xx x, 
I 


V n,i I», 









382 Alt INTBODDCTION TO STATISTICAL METHODS 


Where X % and X t are the means of 1st and 2nd sample res¬ 
pectively, rt h n t is the number of items in the two samples and S 
is the standard deviation of the difference between two samples. 
The value of $ is obtained as follows : 


$ 



*y. # „ 


Note that t is based on («,-f fij - 2} degrees of freedom. 

The following example* would illustrate the use of the above 
formula : 


Muttra lion r 

For a random sample of 10 pigs, fed on diet d, the increase 
in weight in pounds in a certain period were : 

10, 6, 16, 17, 13, 12, 8, 14, l.V> 

For another random sample of 12 pigs fed on diet /?, the 
increase in the same period were ; 

7, 13, 22, 15, 12, 14, 18. 8, 21, 23, 10, 17 
Test whether diets d and /I differ significantly as regards 
their effects on increase in weights. (A A 6\, 1954 ) 

Solution : 



SAMPLING AND STATISTICAL INFERENCE 383 


Mean increaK in weight or 10 pigi 

fed on diet A, *1^4” «* J-~~» 12 lb». 

ft l 10 


Mean increase in weight of 12 pigs 

rv ton 

fed on diet B, 15 lb*, 

JS t 1U 



Let us take the hypothesis that diet A and H do not differ 
significantly as regards their effects on increase in weight. 
Applying the formula 


t 






v„, 


"l". 

H-»« 


15.t<> /fOyj'i 

4W X V io+ir 


i 

1 55 


■■ X v / 5 t 45 s " 


-1*5 

degrees of freedom n, 4- n t — 2 ~ 10+12 — 2 ** 20 

For 20 degrees of freedom at 5% level of significance the 
table value of 1^2 09, The calculated value of t is less than 
the table value and hence our hypothesis holds good. In other 
words, diet A and H do not differ significantly a* regards their 
effect* on increase in weight, 

I—teat of the Significance of m Observed Correlatlor 
Coefficient 


When we want to test whether r is or is not significantly 
greater than zero, that is to say if we art interested in testing 





384 atc mTHomJcrror* to statistical methods 


the 'mill hypothesis* that the variable in the population are 
nncorredated the following formula is to be used : 

'-x/T^rxV^ 

t in case is based on (n 2} degrees of freedom. 

The follow ing example will illustrate the use of this formula. 

Illustration : 

A random sample of 18 pairs of observation from a normal 
population gives a correlation coefficient of 0*52. Is it likely 
that the variables in the population are unrorrrlater! ? 

Solution : 

Let us take the hypothesis that the variables in the popula¬ 
tion are uncorrelated. 

Applying t test, 

r ... ‘MxV 16 

'^r,— y v«.Vi 1*« 

■52x4 2 08 

■"vT^ST -8S ”*‘ 4 

degree* of freedom 18 2 ■ IB 

For IB d j\ at .V-„ level of significance the table value of 
< 212, The calculated value of t is greater than the table 
value Hence for 5% level of significance there is reason to 
doubt the hypothesis that the variables in the population are 
uncorr elated. 
lUustrahM ; 

How many pairs of observations must be included in a 
sample in order that an observed correlation coefficient of value 
"42 shall have a calculated value of / greater than 2’72 ? 

Solution : 



i>e., 2"72 - 




SAMPUNC AWD STATISTICAL INFERENCE 385 




•463 v / A > - 2 
v/;vZ2 


«2-72 


taas 


272 

’463 


-* 5‘9 

-V- 2 -(5*9)* 

^ 34 BI 

So A'- 36 ft I or 3? approx. 


Hence 37 pairs of observations mult be included in a sample 
in order (hat an observed correlation coefficient of value ‘42 
shall have the calculated value of t greater than 272. ^ 


/•test for Testing Significance of V* 


The / test discussed above is applicable only for determining 
whether the computed Vi* significantly different from zero. 
When we want to test the sample correlation against any other 
theoretical value of V (other than zero), or if it is desired to 
test whether the two given samples have come from the same 
population or not, V test cannot be used. 

Professor Fisher has shown that if V is changed to another 
statistic ‘Z* by a suitable transformation then this testing if 
possible. He has derived that if Z is calculated by log, 

~, where t is bate for natural logarithm, ue. t 1*1513 log 


1+r 


the distribution of Z 


will be approximately normal. 


The standard deviation of Z distribution will be 



{where .V is the size of sample) and the arithmetic mean of 
this distribution will be the Z corresponding to the population 
value of V. 

The following illustrations will explain the use of the tent, 

2 % 



386 AVI INTEODDCnOIf TO rTATimCAL METHODS 

tUuMmtim : 

A rmdom «stnpl« of 30 pain of observations from a normal 
f opu’atton shows a correlation coefficient of 75* Is this 
consistent with the assumption that the correlation in the 
population U ‘35 ? 

Soltdwn : 

Here sample correlation is 0*75 and we have to lest whether 
this temple can come from a population with correlation 
0*55* 

Now M5I3 i og Jil 


Sample Z- 1 ' 513 log 

~|-iSI3y,U>|f-i~ *1-1513 in,; 7 
-1-15 i 3 x O'843I =*0-972 
Pop. £~|-»513 log 


«1*1513 log 


I 53 
0 45 


M 513 x *5370-“-*619 


According to ^test, devised oy Fisher, the distribution of £ 
is * normal distribution with mean 0 019 (Vi ami standard 

* . i r' i . 

u< taticn ~ j.r y -^- 7 ^ • or M92 (<t). The sample 
value to be tester! is 0*972 (X). 


Now r* 


‘972 619 

’192 


i 84 


At ibis value is less than 1*96 (i.e., 5% level of significance 
value from normal distribution) the difference between sample 
value and pop iation value is not significant. In other words, 
this sample can come from a population with correlation 0*55 




SAMPUltC 4KB STATISTICAL mSKMKCK S87 

Illustration : 

Two independent wimples have 28 and 19 pairs of observa¬ 
tion* with correlation coefficient 55 and 75 respectively,. Are 
these values of r consistent with the hypothesis that the samples 
are drawn from the same population ? 

Solution : 

Our hypothesis that the samples are drawn from the same 
population 

** it 14-r 

"j— 

t 1 4“ r 

< S |l 0 R.~ 

According to Fisher is distributed normally with .mean 

0 and standard deviation^ y ^ 3 r y ' ^ 

„ i, M\55 

£r *f 


» I 1513 log 3 44 
-=■ M5I3 x *5378 
'619 


•C: i ^ 


l J 75 
'r™75‘ 


M 513 log 7 - I ■ 1513 x 8451 
,-■972 

£i^&®’$72--*619~-353 
Standard deviation of the distribution of f*| 


I4.J- 

25 16 


V 

Vw“ 32 (S-E °U.- C») 


T _ i-a-eti -353 

Tc "o:of^%-3r 


1107 




388 AN INTRODUCTION TO STATISTICAL METHODS 


Thi* is lest than 1 96 (3% level of significance). There is, 
therefore, no reason to doubt the hypothesis that the samples 
are drawn from the same normal population. 

Tli# Variance-Ratio Test or tlie F-test 


The object of the F-tc»i it to discover whether the two in¬ 
dependent estimates of the population variance differ signi¬ 
ficantly or whether the two samples may be regarded at drawn 
from the same normal population of variance a*. 


The variance ratio or F** 


c i 


The numerator is always the greater 
are based on (.V| — 1) and (A' 3 — 1) 
respectively. 


variance. 

degrees 


$i 9 and •$■** 
of freedom 


Idusttolton : 


To assess the significance of possible variations in perfor¬ 
mance irt a certain test as between the grammar schools of a 
city a common test was given to a sample of students taken at 
random from the senior Hfth form of each of the four schools 
concerned. The results are given below. Make an analysis 
of the variance of the data. 


Schools 

ABC D 


ft 

7 

4 

3 

5 

5 

6 
6 


7 

5 

5 

4 

3 

4 

6 
4 


3 

3 

4 

4 

3 

5 

4 
4 


10 

5 

6 

4 

8 

* 

a 

4 



•ASmjKC AM> STATISTICAL IfttKERUCE 389 


Solution : 

To simplify calculations Jet us take 5 as the working 
origin : 


A 

B 

c 

b 

3 

2 

0 

5 

2 

0 

~2 

0 

- 1 

0 

.1 

! 

0 

~ 1 

I 

- 1 

0 

2 

— 2 

3 

0 

1 

l 

! 

0 

-1 

2 

3 

I 

Total 6 

) 

— 2 

-1 

-0 

-i 

12 »8 

Correction factor 

T* 

:r 

,«,2 

32 


Squaring the items 




A 

T 

.c: ^ 

^ 1) 

9 

i 

0 

25 

4 

0 

4 

0 

1 

0 

1 

1 

0 

I 

1 

1 

0 

4 

1 

9 

9 

1 

0 

4 

1 

I 

1 

9 


1 1 l 


Total 16 12 12 50 .**90 

Total sum of squares«* 90 -■ 2 - 68 
Sum of squares between samples 

(6)*+(2)H(8) J -f(I2)» 0 

8 ~ ' 


248 

“8 * 


- 2-29 


Sum of squares within tarn pie* *» Total sum of square*—* 
Sum of squares between samples-*88 -29** 59 







390 AW INTRODUCTION TO STATISTICAL METHODS 


Analysis of variance table 


Source of Variation 

s.s 

d,f 

M.S 

Between Samples 

. 29 

~.T 

.97 

Within Samples 

59 

28 

21 

Total.*. 

~.86. 

..3i. 



V|W*3 \ v # *»28 

for v l e«3 and F w 2'95. The calculated value of 

f is greater than the tabic value and hence the difference bet¬ 
ween tlit school meant is significant. 

Wmtriition : 

In a feeding experiment on twins, three rations A?,, R it 
were tried. The animals were put into three classes of three 
each according to Litter and initial body weight. The following 
table gives the gains in body weight in.lb*, in a certain period, 



Class I 

Class 11 

Class III 

/f. 

4 

16 

..10. 


14 

18 

19 


$ 

14 

7 


Analyse the. data and state your conclusion. Has division 
into classes proved effective ? 

Solution : 

Let us take the hypothesis that the division into classes has 
m> effect on gain jn body weight. 

Taking 10 as the working origin to simplify calculation. 


Class 1 Class II Class III Totals 

. R, ~ . :: 6 . 6 ’ 0 ~". . 0 ™ 

R, 4 8 9 21 

R % -7 4 -S —6 

~.tout ~9 IS 3 ‘ 15 


7* / 15 ) • 

Correction factor#® 23 

Taking the squares of the items to find out total sum of 
squares 












SAMPLING AND STATISTICAL INFERENCE 391 


Clan I Claw II 

Class III 

Total* 

R, 36 

36 

0 


72 

R t 16 

64 

81 


161 

R, 49 

16 

9 


74 

Totals 101 

116 

90 


307 

Total sum of squares^** 307 - 25». 
Sum of squares between classes : 

282 



(-- 9)*+{l8j*+{6)* 

...3 . 

25 



441 

---25^-122 




Sum of squares between rations 




0*4 ( 21 )*' 

3 

+ (— 6)» 

25 



. .. 159 - 25 « 

134 




Analysis of variance table ; 





Source of Variation S,S. 

d.f 

MS. 

Variance ratio 

Between Rations 134 

2 

67 

67 

TFT 

-10 3 

Between Classes 122 

2 

61 

61 

65 

-9 38 

Residual 26 

4 

6 5 



Total 282 

8 





*Y-4 

Calculated values of F, and F, are 9 38 and 10 3 respective* 
ly. Since these values are greater than the table value at 5% 
level of significance, our hypothesis stands rejected. Hence we 
conclude that division into classes has proved effective. 
Chi-square teat and goodness of fit 

In the previous chapter we have discussed certain techniques 
that are helpful in finding out the relationship between fact 
and theory (hypothesis^ For example, in illustration no. 1 m 
page 361 by computifig the standard error of mean we eitibb 







392 AW INTRODUCTION TO STATISTICAL METHODS 


shed that there is a significant divergence between the sample 
mean (W*) and the assumed hypothetical mean (70*). In the 
present chapter we shall discuss a particular significance test 
known as Chi-square test (written as y* and pronounced as 
ki-square lest) which is used in a very large number of cases to 
teat the accordance between fact and theory. 

The greek letter y* was first used by Karl Pearson in the 
year 1900 to describe the magnitude of discrepancy between 
theory and observations. 

The statistic /• may be defined as 



where 0 re fern to (he observed frequencies of the sample, 

E refers to the expected frequencies, ic, frequencies 
which we expect on the basis of some hypothesis, 
and summation (E) extends over all the classes in the data. 

It is thus apparent that the quantity x s is derived by (!) 
taking the difference between an observed frequency and ex¬ 
pected frequency, (2) Squaring this difference, and (3) divi¬ 
ding the squared difference by expected frequencies. 

X* test makes it possible to determine whether a given discre¬ 
pancy between theory and observation may be attributed to 
chance or whether it results from the inadequacy of the theory 
to fit the observed facts. The value of y a will be zero if the 
observed and expected frequencies completely coincide. As 
the value of /* increases the observed frequencies depart more 
and more from the expected ones. 

In order to determine whether the divergence between 
observed and expected frequencies is due to chance or other¬ 
wise* we have to compare the computed value of y 1 with the 
table values Table values of x* as given by R, A. Fisher arc 
available for various levels of confidence, ordinarily upto 30 
degrees of freedom. If the calculated value of y* is less than 
the table value at the particular Wei of confidence (generally 
it is level on which we test the hypothesis) the divergence 
between the ohscrN^d and the expected valuers said ^ arise due 




SAMPLING Ann STATISTICAL INFERENCE 393 

to fluctuations of sampling. If the calculated value of x* exceeds 
*** "die^al^le^" tfrvssrgptsticirr between the observed and 
expected frequencies is said to be significant. 

Degrees of freedom 

As pointed out about in the application of x a test we haw 
to determine the degrees of freedom. The terra degrees of 
freedom refers to the n umber of classes the frequencies of which 
could be filled in arbitrarily without violating any of the totals, 
sub-totals etc. For example 5 coins are thrown 700 times and 
the following results are obtained : 

No. of heads actual frequencies 

17 
132 
230 
204 
103 
14 

701) .. 

Now if we write the expected frequencies we have the 
liberty to write any live figures but the sixth figure must be 
equal to 700 minus the total of five figures we have written for 
the simple reason that the total of expected frequencies must be 
the same as that of actual frequencies. Hence there are five 
degrees of freedom in this case. In such cases the number of 
degrees of freedom are one less than the number of cases, i.e., 
h - 1 , 

In case of contingency tables the degrees of freedom are 
obtained by applying the formula : 

(< .«) 

where V refer* to the degrees of freedom 
r refers to the number of rows 
and, c refers to the number of columns 

Thus in a 2 X 2 table the degrees of freedom are 

Pa*(2— l) (2— 1)« l, (because there are two rows 
and two column* in 2 X 2 tabic) 
and in A 3 x 3 contingency table the degrees of freedom are : 
K«i(3— I) (3—1) —4 


0 

1 

2 

3 

4 

5 

Total 









394 AN INTRODUCTION TO STATISTICAL METHODS 

It should be noted that table values of y* arc ordinarily 
available for 30 degree* of freedom and for higher value* we 
make use of the fact that the distribution of /* becomes nearly 
normal, A good approximation is given by assuming that 
is/2y} J2X --t j is normally distributed about aero with 
unit standard deviation. If this quantity exceeds 2 or even 
164/ for the 5% level the value of /* significantly exceeds 
expectation. 

Condition! for the application of y* test 

The following precautions should be observed while 
applying the x® test : 

(1) ,.Y t.e. the number of observations must be sufficiently 
large otherwise the differences between the actual and 
observed frequencies would not lie normally distributed. It 
is difficult to say what constitute* largeness or smallness but 
as an arbitrary figure we may say that .V should not be less 
than 30. 

(2) No theoretical frequency should be very small 
Again it is difficult to decide what constitutes smallness. In the 
opinion of Yule and Kendall 5 should be regarded as the very 
luttttmum and 10 is better. When the theoretical frequencies 
are less than 10 the adjoining classes should be merged together 
»o that the frequency exceeds 10 or in case of 2x2 contigency 
table the use of Yale's correction should be made. 

Yate’a correction. One of the conditions for the 
applicability of x* test is that no theoretical or expected 
frequency should be less than 5 in any case, 10 is stilt better. 
When the theoretical frequencies are smaller than 10 and 
especially when smaller than 5, the ordinary table values of 
X* are inaccurate. This is especially true when there is only 
one degree of freedom. It is true to a lesser extent for two 
or three degrees of freedom. However, the error is negligible 
when the dgreet of freedom exceed three. 

When there is only one degree of freedom, a simple varia¬ 
tion in the formula for x* will adjust the ‘calculated* y* » 
that it is comparable with the table values of A In « 2x2 



SAMPLING ANI) STATISTICAL INFERENCE *95 

table the adjustment consists of adding *5 to those observed 
frequencies which arc less than 10 and subtracting ‘5 to those 
frequencies which are more titan 10. This is know n as Yate's 
correction. Yate*s correction should not be applied when 
degrees of freedom is greater than one. It should always be 
made when degrees of freedom is only one ami A is small. 
Uses of /* teat 

/* test i? very widely used in statistical work, The following 
are the three most important situations where yfi test is used. 

(1) To test the discrepancies between obseued and expected fire* 
quencia, / % test can be applied in those cases where we want 
to know whether the difference between the observed and 
exacted frequencies arise due to chance or whether it results 
from inadequacy of the theory to fit the observed facts. 

(2) To test the goodness of fit. /* test is the most important 
of all the tests used to find out closeness of fit. When we fit 
an ideal frequency curve whether normal or some other type 
to the data, we. are interested in finding out as to how well 
this curve fits with the observed facts. This is a problem of 
goodness of fit. Although by inspection one may say whether 
the fit is good or not, precision can be secured by applying 
the /* test. y‘ J test can be applied to all forms of the curve 
fitting where we can know the distribution of the classes to 
the means of which we are attempting to fit the curve. 

(3) To determine association between two or more attributes, 
j} test is widely used to find out whether or not there is 
any association between two or more attributes. For example, 
by applying x* lest we can find out whether there is any 
association between the colour of father's eye and son's eye* 
In sufch cases we proceed on the ‘Null hypothesis* that 
there is no association between the attributes. If the calculated 
value of x* (at a certain level of significance) is less than the 
table value the hypothesis holds good otherwise we reject the 
hypothesis and draw the conclusion that there is association 
between the attributes. 

The following examples will illustrate the use of the 
y f teat. 



3% Alt INTRODUCTION TO STATISTICAL METHODS 


lUmtratten : 

A die is thrown 132 time* with the following result* : 

No, turned up 12 3 4 5 6 

frequency 16 20 25 14 29 26 

Test the hypothesis that the die is unbiased. 

Solution : 

On the baits of the hypothesis that the die is unbiased we 


should expect each number to turn up 


132 . 

----- ** 22 times. 
6 


Applying x* ^ 


0 

E 

(<O -E )* 

{o-Em 

16 

22 

36 

1 64 

20 

22 

4 

T8 

25 

22 

9 

41 

14 

22 

64 

2 91 

29 

22 

49 

2'23 

28 

22 

36 

1-64 




1'iO-Ep u ' 
"JJ'i.V .«oi 


*/«•-. 1-5 

For 5 d f at .5% level of significance the table value of x 1 ** 
11 *02, The calculated value of x* is lets than the table 
value and hence there is no evidence against the hypothesis 
that the die is unbiased. 

Illustration : 

The figures given below are (a) the theoretical frequencies 
of a distribution and (b) the frequencies of the Poisson distribu¬ 
tion having the same mean and total frequency as in (a). 
Apply the x* text of goodness of fit. 

(«) 305 565 210 BO 28 9 % 

(b) SOI 361 217 88 26 6 1 







SAMPLING AND STATISTICAL INFERENCE 397 


Solution : 


0 

E 

(0 -£)• 

(0—E)*jE 

305 

301 

16 

•05 

365 

361 

16 


210 

217 

49 

•23 

80 

88 

64 

73 

28 

26 

4 

•15 

9 

3 

?} 

25 

3*60 




..^g-4-W 


X*“ 

E(O-E)* . . 



df- 

7 3-4 


(The 

number of d.j 

is one for each class 

less one for each 

'restraint' 

'. The original 4 

7 classes have been 

reduced to 6 by 


grouping, thus reducing therf./hyl. In addition, the mean 
and the total frequency of the original distribution have been 
used in calculating the theoretical frequencies, thus introduc¬ 
ing two restraints. The No. of d ./is accordingly 4), 

For 4 df at 5% level of significance the table value of 
y*9*49. The calculated value of /• is less than the table 
value and hence the fit is good. 

Illustration : 

The figures given below are (a) the theoretical frequencies 
of a distribution and (b) the frequencies of the noriVial distri¬ 
bution having the same mean, standard deviation and total 
frequncy as in (a). Apply the /* test of goodness of fit. 

(a) M2, 66 220 495 792 924 792 495 222 66 |2 j 

(h) 2 15 66 210 484 799 948 799 484 210 66 J5L 

“;v— 

Solution : 

We are given (a) the observed frequencies and (b) the 
expected frequencies of the normal distribution. We can 
apply the y} test of goodness of fit. 





398 AM INTRODUCTION TO STATISTICAL METHODS 


0 

E 

(i0~E )* 

(0—E)*jE 

. 1 

12 


16 

•94 

66 

66 

0 

•00 

220 

210 

100 

‘48 

495 

484 

121 

‘25 

792 

799 

49 

06 

924 

943 

361 

*38 

792 

799 

49 

■06 

495 

484 

121 

•25 

0^2 

210 

144 

‘70 

66 

66 

0 

*00 

12 

1 

_J!L 

9 

■56 


The number of degree of freedom 12 5 7 

(The number of d j is I for each class, less one for each 
‘restraint 1 . The 12 original classes have been reduced to 10 
by grouping, thus reducing the ti /by 2. In addition, the mean, 
the ft and a rd deviation and the total frequency of the original 
distribution have been used in calculating the theoretical fre¬ 
quencies, thus introducing three restraints. The number of 
irf./is accordingly 7 ) 

For 7 iY./ai 5% level <>{ significance the table value of 
y* 1107, The calculated value of/* i,s less than the tabic 
value and hence thr lit is good. 

ilimtuiwm ; 

investigate the association between the darkness of eye 
colour in father and son from the following data : 

frequency 

Fathers with dark eyes and sons with dark eyes 50 

Fathers with dark eyes and sons with not dark eyes 79 

Fathers with not dark eyes and sons with dark eyes 89 

Fathers with not dark eyes and sons with not dark eyes 782 


Total 1,000 







Colour of sons eyes 3- J Colour of sons eve* 


SAMFLINC, AND STATISTICAL INFERENCE 


StimtiM : 

The shove information can be arranged in the form of a 
2x2 table at follows : 


Colour of fathers eyes 


Dark 

Dark 

50 

Not dark 

89 

Total 

139 

Not dark 

79 

| 782 

| 861 

Total 

| 129 

j ' ' i 

; 871 

11.000 


Let us take the hypothesis that there is no association bo 
een the colour of the fathers’ and that of the sons’ eyes. On 
* basis of this hypothesis the expected frequencies arc : 


Colour of fathers eyes 



l ' ' : 

j Dark f 

Not dark 

Total 

Dark 

139 ! 

p W| xl29-lBappi 

.. . J. 

(21 

V 

139 

Not dark 

■ i 

■ m s 

• 1 

-4 

o 

..._ 

i ,861 

Total 

129 1 

; j 

871 

1,000 


J). 

. E 

JO- -EJ _ 

iO- EJ: 

50 

18 • 1 

1,024 

~~5Kl 

79 

111 

1,024 

9 2 

89 

121 

1.024 

8 5 

782 

750 

1.024 

1*4 


£( 0~F.)* 

'"E .' 


* 76.1 


# 2(0~E)* 

- 


«76M 


d, f *»■ (r — I) fe - !)«l 












400 AH INTRODUCTION TO iTATIATICAt METHODS 


For one degree of freedom at 5% level of significance the 
table value of x ,aK *3’84. The calculated value of x 1 » much 
greater than the table value and hence our hypothec)* stand* 
rejected. We, therefore, conclude that there it asaociation be¬ 
tween darknet* of eye colour in father and ion. 

JUustratim : 

In an experiment on immunisation of cattle from tubeT- 
culosis the retultt were obtained : 

Affected Not affected 

Inoculated 12 26 

Not Inoculated 16 6 

Discuses the effect of vaccine in controlling susceptibility to 
tuberculusis. 

Solution : 


Let ut take the hypothesis that vaccine has no effect in 
controlling susceptibility to tuberculosis. On the basis of this 
hypothesis the expected frequencies are : 



Affected 

L _ .__ ___ 

Not Affected 

Total 

Inoculated 

38 ' 

^ x28*« 18 appj 

20 

38 

Not Inoculated i 

! ‘0 

! , 

12 

| 22 

Total 

28 

!..... | 

32 

; 

[ . 

60 


0 

. JL .. 

E 

(O . £)* 

(0 - E)'!E 

s ii'5 

■' re..~~ 

36-25 

.'.*68. 

I5‘5 

10 

30-25 

302 

25-3 

20 

30-25 

1*51 

65 

_ ..12..._. 

30-25 

2*52 


, 


£iO~~£)' 

. —-p -«*8 73 


' £( 0 ~£)» 
x T £ 




{«- 1) (2-1) ( 2 - I) I 







$Am tmc AND STATISTICAL INFERENCE 401 


For one degree of freedom at 5% level of significance the 
table value of / t *«3*84, The calculated value of x* *» greater 
than the table value and hence our hypothesis stand* rejected. 
We, therefore, conclude that vaccine is effective in controlling 
susceptibility to tuberculosis. 

Illustration . 


In a recent diet survey the following results were found in 
an Indian city : 


No. of families 

Hindus 

Muslims 

"Tm»I— 

T aking tea 

1,236 

164 

1,400 

Not taking tea 

564 

36 

600 

Total 

1,800 

200 

2,000 


Discuss whether there is any significant difference between 
the two communities in the matter of tea-taking. 

Solution ; 


Let us take the hypothesis that there is no difference between 
the two communities in the matter of tea-taking On the basis 
of this hypothesis the expected frequencies would be : 

No. of families Hindus Muslims Total 

Taking tea ~~~ x 1,800*» 1260 HO 1.400 


Not taking tea 540 60 600 

Total ~ [MO 200 ~ 2,066 






^ |||| 















■' i11 i 1 '' 1 MtlLI^Wn' 


1 ( 0 -■■£)* 


15*237 


ZtO z E)* 

E 


15*237 


I) U- l )-<2 .I) ( 2 ~I)»I 

For l dr/at 5% level of significance the table value of 
y*s»5‘84. The calculated value of x* is greater than the table 
value. Our hypothesis, therefore, stands rejected and we 
conclude that there is significant difference between the two 
communities in the matter of tea-taking. 

26 










402 Aft INTRODUCTION TO STATISTICAL METHODS 


EXERCISES 

I, Explain (he following term* : 

(a) Level of significance, 

(b) Null hypothesis. 

fpss) 

r whai it v-t«t > (m.a. OitK mm 

3. Explain the method used for testing (he difference between meant of 
two sample*. 

4. Distinguish carefully between parameters and statistics. 

(M A > Dilhi, 1957 ) 

9. Explain the trrm 'standard error' of a statistics. What is its uae in 
Statistic* ? (/..4X) 

6 . In a certain sample study following are the result* : 

Size of the sample 300, Arithmetic mean 506, and standard 
deviation 16*8. Does the arithmetic meat) differ significantly from 
480? {14,1) 

7. The following table gives the distribution of families according to the 
number of persons in each. 

No of persons 1 2 3 4 5 6 7 8 9 

No. of families 107 245 228 174 J0C> 65 35 21 19 

Calculate the mean and standard deviation of the above distribu¬ 
tion. Does the average sire differ significantly from 5 ? 

i B.Cfim., iUihx, 1955 ) (14.2) 

8 . Medial income of 1 50 labourers selected at random from a certain 

province is Rs. IQS 75 p.m. with standard deviation Rs. 2547, I>u 
you have reasons 10 believe that average income of the labour commu¬ 
nity in this distiict is Rs. 100 .? f 14.31 

9. A sample of size 400 is selected from a population nf very large size. 

Quart ties of the sample are found to iw 17 and 29. The standard 
deviation of the items was 10 . (Ian the sample come from a population 
with quartilel 15 and 30 ? (14.4) 

10 From a sample of pea plants the number of round f»ea» is i% and the 

number of angular peat is 10). It (his in agiermem with the Mende- 
Uan theory which says (hat on the average (he ratio of round to angular 
peas «* 3 : I ? [14.5] 

11. In a large city 327 men out nf a random sample of 600 men were found 
to tie smokers. Does this information support the view that the 
majority of men in this city are smokers ? 

{B.Cfim.. Dtlhi, 1 954 ) [14.6] 

12. In 3)5,672 t hrows of a die a 5 or 6 turned up 106,602 times Is tbst 

sufficient evidence that the die is biased ? { Whildont data) [14.7] 

11 A sample of size 600 selected from a city gives that males are 53%. It 

It a sufficient evidence against that in the city mates and females are 
equal in number ? [144] 

14. In a sample of site 250, die srandard deviation is found to be 13*6. 

The coefficient of variation is 78%. Can ihe population standard 
deviation be 15 and coefficient of variation 75 % ? [14,9] 

15. in * sample of site 10 Q coefficient of correlation was found to be CM. 

Is it a significant figure ? [14,10] 



Mwunc AND STATISTICAL INPBBENCR 403 

*6. A sample of smm* 1M give* the value of a x a* 0 56 and the value of «* 
*s 0*29. Can the population value* be 0 25 am) 0-75 > f I4.ilj 

17. A sample of war 2,400 give* the value of y t ami fa a» 037 and 1*4®. 
Can this sample come from normal population t (14 12] 

Iff A sample of *i«e 175 give* the following constants : 

A.M 45 8. Median 50*0, quartile first 10*9, qua rule upper 74 S» 
Standard deviation 15’7» Y ( ** 107, fa** $'79. Find the (i) 95% and 
fill 99% confidence range* within whkh the population constants 

can He. (14*14) 

}<). A random sample of 500 pineapple* wit taken from a Urge consign* 
went and 65 were found to be bad. Estimate the propm lion of bad 
pineapples in the coiuignmetit, ai well a* the standard error of the 
estimate. Deduce that the percentage of bad pinraplrt in the con* 
sign men i almost certainly lies between 84 ami 174. 

i,u, rm' fi4;i5j 

ab, Following table ft,i\e* the diittibution of marks of 100 student* selected 
at random from student* of a certain untvmiiv, Find the range with* 
in which the median* quartile, standard deviation, #| and of the 
marks of all the st udent* of t he university iio 
Mark* 2040, 30*40. 404(1, 50*60, 60*70, 70,80, 8CMI0 

So., of Students: 5 IB 25 26 12 B 6 

{14.22} 

21.. Two samples give the follow mg result* : 1 sample : A. Mean 14*72, 

*urtdsL«d deviation 3 45 and sample fa) 100. It sample : A. Mean 
18-hi, standard deviation 4*72 and *i?e 125. Have you reasons to 
Induce that the two arithmetic means are different ? 

' Af Cam., fhlhi, ifrff i (1449) 

').* The subject under tnvrst igation i* the measure of dependence of'Tamil 
word* of Hausknt origin. O nr newtpaper article reporting the 
proceeding* of Constituent Assembly contained 2,025 word* of which 
729 words were declared by a literary critic to be of Sanskrit origin 
A fecund article bv the same author describing atomic trscarth con* 
»«»ined 1,600 word* of which 640 words were declared by the same 
criticto be of Sanskrit origin Assuming that simple sampling conch* 
non* held, estimate the Ijinits for the proportion of Sa nskrtt terms in 
the writer "* vocabulary, and examine whether there is any significant 
difference in the independence of this writer in word* of Sanskrit origin 
in writing on these two subjects. (/ .IX. ipgy) (14 20) 

21 Two sample* give the following constants Is there any significant 
difference between the constants ? 

Sample 2 : 

Median 37-5, St. D,;vi.~8*5, £|» i-76* 0*™246 and Aw KM). 

Sample II : 

Median~39*6. Si, Devi.- 10*3, 1*23. and A 225. 

(14.21) 

24, In a consumers' preference survey by simple sampling method* 56% 
of persons gave preference in city A fiw a certain commodity, la oily 
6 the similar percentage was 52, If the sample selected in the first case 
was of 600 and in the second case it was of 500, what are your conch** 
•turn regarding the difference in preferences in the two cities ? 




404 AM IMTHOntJCTtOM TO STATMTICAi METHODS 


23. A random lampfc of 200 village* wjm taken from * dwuwt and thr 
average population per village wga found to be 4 05 with standard 
deviation of 50. Another random wm^k of 200 village* from the 
Mm** dittwt nave an average* population of 310 per village with Han- 
dard deviation of 40 la the differeoce between the average of the 
two tatnplea fttatiaiieally tignificant ? ( M. A . IMfn. /95 jj {1 4-23} 

26. A trample of tore 16# was diawn from a certain population- Another 

•ample of lire 225 wait drawn from another population Thr standard 
deviation of both the; population* i» known u> he 12. The two aamplet 
gave the following value* of ibr other constant* U there any »i*eni- 
?i«,«nt differ core in thear in the two population* ' 

Sample I : 

ArithmeiK mean 5B*7. Median 6? #, quartile tint 414, h, 18# and 
J»s 4‘<* 

Sample 11 r 

Arithmetic mean 62*4, Merit an 65 T lower qua role '48 0 , 103 and 

A, 4*07. 114-25] 

27, A lample study of the population of two dm tot* give* thAi ir. district 

A the percentage of male population n 32 while tin* ♦* the figure for the 
female pm-entag* in thr D dtMnct If the »ize of the sample x«‘rcted 
in dolriel A and B i* 400 and 625 respectively. '■an l*a<h of these jumple* 
hr taken income from a population where thr pet rentage »d male* and 
female* it equal •* {M 27 j 

211 By a <omumen’ preference survey a firm found horn a sample of ikr 
5#fX4 that three •fourth* of the person* prefrrml ihrtrcutmiKvioin. The* 
then planned a Urgr^iwlf advertiimieiit camp,’*opr At the end 

of one year they found from a sample of ««ae ! ,000 that nrm four-fifths 
td the population purchased their comrmwlu »r*, Was thr advri r in*merit 
campaign beneficial to the firm ? {14.28] 

2V A simple •ample of height of 6,400 Englishmen has a mean of f./MiV 
and a standard deviation of 2 Mi*, while a sample, of height of l,600 
Austral urns hit* a mean 68 "Vi' and a stand*! d deviation of 2\52*'. Du 
the data indicate that Australian* are on no average taller' than 
Engiithmen (A C$m , Dt!hi t t?w' |14 30] 

50. It it deairrd to know whether urban families r<»n*umr significantly 
mure tea than rural families. The average purchase per family is 
found to he 3 Ih per vear with a variance of i 4, whereas a sample of 
l *0 rtnai families reveal then average annual tea pmcha.se to be 2’4 lb. 
with a variance of l f». What is your conclusion "’ 

! B.Citm., Ihthi , njtf 1,14.31) 

31 in a population of six unit* the value i»f th** units given by 8 . I, 3* 
H. 4, ~ How inanv simple random samples uf sue 2 can be drawn 
from the p-opuiaricm * Calculate the tamper mean , * ■ for all possible 
•ample* uf shre 2 . Wrifv that > i* an unbiased estimate uf the popu¬ 
lation mean. >A4 ,4-, /.W 6 j, t 95 g) 

.12. A drain take# a sample of I Oil bum a csm*tg»mietit of one lakh item* 
o.l a certain good* He limit that in the sample there ace 45 item* <#f 
grade l worth fU Vt pet dusen, 3$ items of grade H worth Hi. IB per 
(torn* And remaining item,* were of grade III worth H*.- 12 pec doarxi. 

'Vithiu what limits shook! the value of the •.•'Unsigrmieni he fixed ? 

IHM) 

U. Two invengatur* ‘(iveni|«i«d the possibility uf though weadi nir. One 
threw » d«e ttUKcti by the other ami having rrcotdfd it, though? of the 



SAMPUNC A!U> STATISTICAL UfTBMNGX 405 


number that turned up for 30 seconds Hie *fcomj invautigatfir wrnt* 
down what number he (bought it wit, On checking their figwre* 
there wai agreement 1 18 time* in 600 throws. f» thts result sigiuttcam 
;*} on the 1 % level, \bj an the 5% level ? (14.33) 

34. A sample of tilt 10 has arithmetic mean a# 54r8 and standard do ration 
16 4 Can it come front a population with mean 30 ? (15.2] 

33 A reruin tiimulus administered to each of 12 patients resulted in (he 

following simtt ease of blood pressure; 5, 2, 8, 1, 3, 0,-31, 1, 5, 0, 

4, fi. Gan it he concluded that the stimulus will in gmrntl iw ate urn* 
panird by an increase m Wood prcaattrt ? (A f A., DM. n}$f) 113.4) 

36. The yields of two type* ’Type 17' and r I\p« 51' of gram to pound* 

per acre at f> replications are given below What comments would 
vou make on the difference in (he mean yield ? You may assume that 
if there be 5 degrees of freedom F is equal to (h2, t is approximately 

1*4 76 . 


Replica tiort 

1 7 

3 4 

5 

6 

Yield m lb. Type 17 

20 50 24-60 

23 06 29 911 

SO-37 

23-83 

Yield to lb. Type 51 

24 80 20 39 

2819 30 7 5 

2fr?M 

22 04 

u*m 

Rainfall at two place* A and B for 10 year* are as below ; 


Vn; 1 2 

3 4 5 

fi 7 ” 8 

9 

10 

Place A 40 30 

34 39 43 

25 49 40 

45 

55 

Place H 39 28 

34 35 41 

23 45 37 

43 

55 


la there any significant difference in the rainfalls fm two places 
taking data «s ai two independent samples, ;b) paired up values ? 

(The table value of t far 18 d f and 9 d.fi respective!* at f>% level is 

2101 and 2‘262--. fI5.7| 

38 » ' Samples of me 8 and 12 are drawn from two batches of certain 

gnndi. The coefficient of correlation between two characteristics of 
the articles are 0 32 and 0 J9 respectively. Are ihese values significant '* 

{bj It was found that the correlation coefficient between two- 
variables calculated from a sample of sire 25 was 0 37, 1>«* this show 

rvideme of having come from a population with gr.ro cor(dacron ? 

1 The table value of V for fi. 10 and 23 degree* of freedom is 2 447, 

2 228, and 2*069 respectively,:. (15.131 

39. A sample o f tut 12 gives the value of coefficient of correlation as 0 75. 

Can the population value lie Ofi * flk.H) 

40. A sample of 15 students give the coefficient of correlation between 

height and weight as 0 0, A sample of 1 2 office employees give the 
simitar u.‘efficient (H>. Is the different t significant ? (15.15) 

41. following table gives the mark* of four students in three different 
examination* Do the average level of knowledge of four itudenta 
differ significantly one from the other ? 


Student 

A 

B 

C 

D 

Exam. 1 

56 

54 

32 

49 

Exam, If 

35 

48 

39 

62 

Exam. Ill 

57 

51 

54 

49 


fNote : F value for '3,81 d.f at 5% level of significance i* 4‘066} 

ne.9] 

42 Two experimenter* determine the moist ore eon tent of sample* of a 
powder, each man taking a sample from each of three consignment*. 
Their luammntti are given below Discos* whether there h any 
significant difference between the ictwlt* of the two chart ver* 



406 AW INTRODUCTION TO STATISTICAL METHODS 


Obenrtn cMifigunenu 

I 2 3 

1 9 10 9 

2 12 U 9 

Sitate the Mtumpiioni you make. i M.A DM) (Id. 13] 

4$. Genetic theory stares that childirn having one parent of blood type 
M and the other of blood type N will always l>r of ant of the three 
types of M, MN, N and that the proportion* of these type* will op 
ftvoMf be I ; 2 : t. A repo*i state* that out of 300 children having 
one M parent and one N parent, 30% were found to be of type M, 
45% type MN amt the remainder of type N. TVsl the hypothesis by 
chhsquare test- (LA*S*) 

(Um 2 d.f. at 5% level the table value nf'y* *» 5 99b (ITS] 

44 In a radio listeners’ preference turtry 120 pfiKini were interviewed and 
their opinion* were *« follow* ; 

Type of munc Language A Language |1 

i n 45 

11 39 23 

Examine whether the preference fot muck type if dependent on 
language. AUo text the hypothesis that half of the radio listeners pre¬ 
fer language A to B (M.A,, DM, t$tf) 

f note for l d.f. al 5% level the table value of clii-tttuare is equal to 

3 84; M3 5} 

43. Two investigator* study the income* of a group of person* by the 
method of lamplim: hollowing result* were obtained bv them : 


investigator 

Boor 

Middle das* 

Well-to-do 

Total 

A 

lt>0 

JO 

10 

iOO 

B 

HO 

170 

40 

MrO 

Total 

300 

150 

50 

500 


Show that the sampling technique of at least one of the investi¬ 
gate* it susprt ted. 

for 2 d f at 1% level of tt^nihrance the table value of chi-square n 
9*211 \ 13.61 

4 b. 'The following table shows price increases and decrease* in markets 
where credit squa re is in operation and where it n not in operation. 

< vH»t squeeze pr»re-decrease price-increase Total 
In operatic**' 8b2 JO 872 

No* in operation 58.2 IB 800 

Total »,444 2tt 1,472 

Find whether the ctfdii squeeze hss been effective m checking 
pttre-wer*axe". {M.A., lM*i, [13.9] 

47. In an e*prt»ment on the immunization of goat* from anthrax the 
follow mg res dn were uhrained. thrive your inference on the effi¬ 
ciency of the vaccine : 

Died of anthrax survived Total 

Inoculated with 

vaccine 2 10 12 

Not inoculated 6 fr 12 

il. A, *W A. S-, (11.11} 

48. When the first proof of 300 pages at a book of 5,000 pages ware read 
the d#st*fbut»on of priming mistakes were found to be as follows : 

No. of mistakes in a page 0 I 2 3 4 5 

No of pages M2 03 20 3 I t 



SAMPLING AW© STATISTICAL INFUIK(Z 407 

fit * ftfmtm taw to the above distribution and apply the suitable 
test of goodness of fit. (B C 0 m. t JfotU, ipjtf) J J3.I5J 

A public opinion poll was tahen in five areas and the percentage of 
persons in flavour of a certain proposal is noted below : 

Area A B C D E 

No* of person* !«2 200 360 m 150 

Percentage of 

persons in favour 6* 70 52 4b 83 

Find out if the support for the proposal refer red to above as 
measured by the percentage of person«in favour for it, was the same in 
all the areas. 

(for 4 d f. at level the chi-square value is 13*28} (18.20) 

In a certain experiment y 1 was calculated as 58 32 and the degrees of 
freedom were 41, What can you say about she hypothesis tested * 
What will be your conclusions if y* was 60 7 ? (18*24) 



Chapter 16 
The Analysis of Time Series 


S o far we have been discussing the various method* that are 
employed to describe and analyse data which exist at a point 
of time (or data in which time is not an important factor). In 
this and the mh*cqunit chapter are described methods that 
are used for the treatment of rime series. 

The time series is an arrangement of statistical data in 
accordance with the time of its occurrence. Such series arc of 
particular importance in the held of economic statistics, For 

most of the data.population, bank clearings, bank deposits, 

output, sales, profits, etc -in economics ami in business are of 
this type In no other field of statistical research do time 
series occupy such an important position It is natural, there¬ 
fore, that appropriate methods for thr analysis of time series 
have been evolved only recently with the adoption of statistical 
methods in thr held of economic* and commerce. 

When the data is segregated by days, months or years we 
would find that thr variable under investigation is fluctuating 
from time to time. These fluctuations are the result of the 
joint action of various forces The object of the analysis of a 
time series is to isolate and measure the separate effects of these 
forces as thev appear m a given series over a period of time. 
Such an analysis will help us in understanding the past beha¬ 
viour of a varrtable m order that we may be able to predict its 
future tendencies. To business executives, who are to plan 
their production programmes much ahead of the actual sales 
such an analysis is of great assistance, for it is with the help of 
analysis of this nature that approximately correct estimates of 
the future demand can be made. The changes or movements 
of a time series may be classified as : 




THE ANALYSIS OF TIME SERIES 409 

(i) Secular Trend. 

(ii) Periodic change* 

(a) cyclical variation*, and 
(h) seasonal variations. 

(lit) Irregular or Random fluctuations* 

Secular Trend 

A study of the series of economic and business statistic* 
would reveal that most of them have a natural tendency to 
increase or decrease over a period of several year*. For 
instance, a scrutiny of agricultural production in India during 
the last fifty years would show that the production has been in 
general on an increase, and that the increase has been fairly 
tegular. The same is true of population, production of steel, 
bank deposits, currency in circulation etc, The underlying 
factor causing an upward trend in a time series may be the 
application of natural sc iences in the fields of agriculture and 
industry, the changes in the forms of business organisation 
facilitating accumulation of huge capital for specialisation and 
mass production, the introduction of automation, scientific 
management, quality control, improved marketing, etc,, to 
raise the standard of living, productivity, etc. 

Not all time series show an upward trend, A declining 
rate h noticed in the data of epidemics, deaths and births, etc., 
owing to better and widely available medical facilities and 
higher standard of living. In economic series also a deding 
trend may be found due to keen competition*—say, of Roadways 
against Railways, invention of cheaper or better substitute, c,g., 
synthetic rubber for natural rubber, etc. Authorities like 
Raymond R Prescott are of the view that Law of Growth 
{embracing (i) period of experimentation when growth is small, 
(ii) period of growth into the social fabric, (iii) period during 
which growth is retarded as a saturation point is approached, 
and (iv) period of stability) applies to all industries and conse¬ 
quently not only doe* relative growth tend to decline, but 
eventually further expansion wilt be physically impossible. 



410 ah ifmoswctioH to statisticai, methods 

This tcsdency of growth (positive or negative) it called 
secular trend, or secular movement. Thus *'secular movement 
is that irreversible movement which continues in general in the 
same direction for a considerable period of time.'* 

Let us discuss the parts of the definition. The words 
“irreversible movement** indicate that it does not change its 
direction as frequently as a so-called ‘four phase* cycle com¬ 
posed of prosperity, recession, depression and recovery. 

The words “continues in general in the same direction** 
tell that it has a gradual and persistent tendency to change in 
the same direction* But this does not mean that the rise or 
fall must continue each and every year throughout the period. 
If we arc given the data for about 30 years during which 
production of a particular commodity tends generally to rise, 
we should say that there was a secular rise in production during 
the period, even though there might be a single year or two in 
which there was some fall in production. 

The words “for a considerable period of time” convey the 
idea that the given movement must last for a period that one 
would call a long time—long for such data to continue to 
change uniformly. “If we are counting the bacterial popu¬ 
lation of a culture every 5 minutes ; and the population 
continues to increase fairly regularly for many days, we 
should say that this was a secular change. Again, if we are 
given production figures of a particular commodity only for 
12 or 24 months, the mere fact of increase in production for 
two successive years, would not suggest a secular change/* 1 
Thus a period of 5 days may Ik secular under one condition, 
whereas a period of 2 year* may not be secular under other 
conditions. 

It may be pointed out that the logic of the analysis of the 
secular trend is not the same lor all affairs. The forces 
influencing the growth of the individual concern are different 
from those affecting the industry of which the particular 


1 Waugh, EUmmtt *f Sictutv*! MttMt. 



THE ANALYSIS OF TIME SERIES 411 

concern is » member, and the factors influencing the economy 
as a whole arc different from those operating in a given 
industry. 

The statistical problem in the trend analysis of a time 
•cries lies in (!) deciding type of trend which will fit the 
data satisfactorily, and (2) fitting live trend of the type decided. 
This problem of trend analysis ariies because one may be 
interested either in the trend itself or one may wish to eliminate 
the trend statistically in order to get one or more of other 
movements in the series. 

Periodic Changes 

Operating along with the growth factor there is another 
group of causes which exert their influence upon the movements 
of a variable. These causes do not operate continuously but 
in a spasmodic manner, and their resultant variation may be 
classified as (a) Cyclical Variation, and (b) Seasonal Variation. 

Cyclical Variations. In a large number of time series of econo¬ 
mic data it has been observed that there is a somewhat periodic 
up and down movement. These movements arc known as cyclical 
variations as they pursue an oscillating movement which, in 
general, bikes the form of a wave, through the distances from 
peak to trough of the waves are uneven. Such cycles are gene¬ 
rally repeated at intervals ranging from about 3 to 10 years, and 
are caused by a complex combination of forces affecting the 
equilibrium of demand and supply. Cycles generally exhibit 
semi-regular periodicity as these art neither as regular as arc the 
seasonal variations, nor as accidental as are the random fluctua¬ 
tions. Further, these movements, also called Business Cycle** 
arc longer duration than the 12 month seasonal variations. 
These cycles also differ from the trend. Even when an industry 
may be showing an upward trend, it is nevertheless possible that 
general business conditions may at times take a prolonged 
adverse .turn so as to depress the values in the time series well 
below those tending in earlier years. Similarly* in times of 
Ufh level of production caused by war* etc.* the values of the 



412 a n introduction to statistical methods 


series may be well above those toward which the data seemed 
to be moving. Thus business cycle*, essentially the outcome of 
the Industrial Revolution, reflect the alternations in general 
prosperity and depression * It is the cyclical variation which 
causes paper profits to become deficits overnight, which 
produces a public psychology which liquidates assets and casts 
doubt on the value of even the best investments as the wheels 
of industry suddenly show down,” 1 

A business cycle may also be referred to as Tour phase* 
cycle, composed of prosperity, recession, depression, and 
recovery. This swing from prosperity to recovery and back 
again to prosperity varies both it) time and intensity. Statis¬ 
tical techniques are employed to isolate these disruptive oscilla¬ 
tions and to analyse the conditions surrounding them, so that 
the Government may take such measure* as are feasible to 
prevent the deterioration of mild recessions or early crises into 
deep depressions 01 collapsing economy, and keep the upswings 
of prosperity within reasonable limits without developing into 
stormy speculations. Such an analysis is of great interest to the 
buftine** enterprise also, as it can modify its programmes in 
accordance with the analysed prediction* of the cyclical swing. 
When these isolated data are graphically represented, they will 
give the curve a zig-zag shape and not only the cycles will 
become apparent, but also variation* in durations and ampli¬ 
tude will also be. clearly evident. 

Smtrwat I'aria/inw It is a common knowledge that consump¬ 
tion and production of many commodities, interest rates, bank 
clearing*, etc are marked by seasonal swings. Climate and 
custom together play an important role in giving rise to seasonal 
movement* to almost all the industries—primary, secondary and 
tertiary. The yearly cycle of the weather directly affects agricul¬ 
tural production and agricultural marketing. There is almost a 
definite and limited period of growing and so also of the harves¬ 
ting of crop*. This more or less regularly recurring period of 
harvesting and marketing every year considerably affects the 


* fteiswraugof, Ritmtmuo' $Utiuh**l M*tk*£s 




413 


THE ANALYSIS OF TIME SERIES 

manufacturing, transport and other industries. Banks are called 
upon to provide a seasonal increase in their credit* and the rail* 
roads are expected to be ready to have their peak loads because 
of harvesting. Even the retail and wholesale dealers fell the 
effect of the seasonality of farmers’ incomes as their purchases 
tend to concentrate mostly during the period of their market¬ 
ings. Though the direct and main impact of nature is mainly 
on agriculture yet climatic conditions, including variations in 
rainfall, humidity, sunshine, heat, wind and snow do produce 
variations in almost all other types of industries. 

Another factor responsible for most of the seasonal varia¬ 
tions in time series is custom. Most of the festival*, holidays 
and marriages ar c attributable to customary practices and many 
lines of retail trade in particular reach their peak* in their 
limited periods, regularly recurring year after year. As custom# 
seldom die and seasons have little change in their periodicity, 
the variations produced by customs and seo&ons are more or 
less periodically regular. These more or less regular intra¬ 
year (within the year) movements recurring year after year are 
called seasonal variations 

Though seasonal variation generally deals with yearly 
movements, yet periodic movement# tnav also be characterised 
as intra-month, intra-week, intra-day, etc The peak activity 
of a commercial bank around the firs? of each month may be 
cited as an example of iutr.vmonth movement. If we have the 
figures of the sale# of retail stores in an industrial town day by 
day and if the intra-week sales are observed, it will be found 
than the sales arc mostly concentrated on the. local weekly pay 
day or the day after which the sales of the remainder of the 
week ate very low. This type uf periodic movement is 
repeated almost with the same regularity week after week, m 
the seasonal movement is repeated year after year. The busi¬ 
ness of a restaurant and the figuers of automobile accidents 
in metropolitan centres are some of the example* of the 
intra-day movements, as these data will show the greatest 
concentration in the evening hours. 

Yet another type of variation though fairly regular and 



#14 a tv iJcraoBucnop* to jttatistical methods 

repetitive, is caused by the use of Gregorian Calendar, There 
is a 10 per cent range in the number of days falling between 
the shortest month {28 days) and the longest month {31 days). 
This variation is much more marked when the number of 
working day* of different month* is taken into account. Natu¬ 
rally such a difference in. the working days must lead to varia¬ 
tions in the month to month figures of production, sales, etc. 
But these changes are frequently repetitive and their influence 
upon the specific time series is fairly regular, both in respect of 
timing arid amplitude. 

Thus the problem t n essentialy one of discovering what the 
regularities are. One may be interested in seasonal variations 
either because one wishes to separate the variations which are 
purely seasonal in order to study business cycles, etc., or 
because one may be interested in the seasonal variation itself. 
The objectives behind the interest in seasonal variation itself 
may be : 

1. to take its advantage bv purchasing such seasonal com¬ 
modities which can hr preserved, at the peak of the season 
when not only price is low but quality may also lie high ; 

2. to iron out the seasonal so that the intra-year variation 
may be less marked, e g., on the side of sales, by making 
effective advertisements, etc., and on the production side by 
stimulating increased production in the off season ; 

3. to curtail the activities of seasonal nature, say, of a 
manufacturing concern, by produrting commodities with com¬ 
plementary seasonal*. 

Imgalsr or Random Fluctuations 

The fourth and last component of time series is obtained 
after segregating the secular, seasonal and cyclical variations 
from the original data of the time series, and is variously 
known a* residual, random, irregular or accidental fluctuation. 
These variations are substantially different from the other 
three components as these movements are not only erratic but 
are also unpredictable. They do not reveal any pattern of the 



Tfli A!ULT§t» OF TIME SKlttES 4 5 

repetitive tendency. Often thete residua! values show a sub* 
stantial rite or fall attributable mostly to numerous non-recur¬ 
ring and irregular circumstance*, such a* wars, prolonged 
strikes, revolutions, locus-invasions, conflagrations, famines* 
epidemics, droughts, Rood*, earthquake*, severe storms and 
other acts of God. Seldom any of these specific forces could 
even crudely be foreseen and as such the question of prediction 
scarcely arises. Naturally, these complex fluctuation* remain 
mostly unanalyarri. Further, cyclical fluctuations may some* 
limes be generated by these episodes and consequently it may 
often be difficult to distinguish and isolate the episodic fluctua¬ 
tions from the cyclical variations. 

General Statement of the Nature of Time Series 

The task of the statistical analysis of time series lies in 
isolating each of the lour components, via,, the fiend, the 
seasonal, the cyclical and the irregular movement* from the 
original composite series, so that each of these components may 
be measured, examined, analysed and described independently 
of the others, i.e by using the scientific procedure of ‘holding 
other things constant’. 

This composite series is symbolised by the following general 
term* : 

O^TxSxCxI 

where O symbolizes the original data and T for trend, S for 
seasonal, C for cyclic and / for irregular components have 
been used, 1 


1 A measure of one more of these components can he obtained by elimi¬ 
nating the remaining elements from the original data of the lime series 
by the process of division. For instance Ci for annual data, will be 


0 

r~ 


T*Cxt 



a 


(For annual data, the influence of seasonal data and consequently Its 
elimination won’t arise/, 

In caae of monthly data, the influence of seasonal variation may 
generally he present, and Ci will be : 


0 

fMT 


TxCxSxl 

”77" 


**cf ■ 



416 ah iwmooucncm to statistical methods 

It may be stated at this stage that though the analysis of 
time series involves the disentanglement of the 4 components, 
yet one should not conclude that the process and techniques 
described above are absolutely accurate and definite. Even the 
seasonal variation can rarely be precisely repeated every year. 

Measurement of Tread 

There are » variety of methods of describing the trend. 
We will discuss them one by one. 

ftn Hand Method. The simplest, quickest anti easiest mrthed 
of estimating the secular trend is to plot th* original data on a 
graph and then to draw a smooth curve through the points so 
that it may accurately describe the general tong-run tendency 
of the data. While drawing such a curve the minor short-run 
fluctuations or abrupt variations are not taken into account, 
The mr of flexible rulers may be made while drawing the 
smoothed line. The smoothed line may also be drawn along 
a straight ruler if the secular variation seems to the eye to be 
indicates the long-run tendency, then the distance of the trend 
line from any point on the original curve becomes immaterial. 

The obvious disadvantage of such a method is that no two 
persons would draw identical trend lines through the same 
original graph. Thus this is an unsatisfactory method of 
estimating trend. 

The Semt-aivragt Method Another method of describing the 
secular ugnd h to divide the original data into tw o equal parts. 
The values of each part are then summed up and averaged, 
The average of each part is centred in the period of time of the 
part from which it has been calculated, and then platted on 
the giaph. Thus a line may be drawn to pass through the 
plotted points. But this also b an unsatisfactory method of 
estimating trend, and can be used only for making tentative 
approximations of the secular change. 

The Mating Average Method. According to this method the 
trend is found by smoothing out the fluctuation* of the date by 
means of a moving average. 



THE ANALYSIS OF TIME SERIES 


417 


The moving average is a aerie* of successive averages 
secured from a scries of values by averaging groups of n 
successive values of the series. These groups are composed 
as follows. The first group cons ins of n first items of the series* 
the second group consists of the item* from the second to the 
(« + ijth, the third consists of item* from third to the (M-2)th, 
and so on. These averages give us the trend values for the 
middle period of each group from which they have been 
computed. The table 16.1 shows the results when a three-yean 
moving average is computed. The three'year moving average 
means that the number of items included in the groups are 
three. It will be called a five-year* seven-year, nine-year 
moving average if the items included in the group are five* 
seven and nine respectively. 

TABLE 16.1 

Three-year moving average applied to the data of cement 
in India in *(100 tons 


Year 

Output 

Three-year moving 
total 

Three-year mov 
average 


A 

B 

O 

1946 

1,542 



1947 

1,447 

4.541 

1,513 7 

1948 

1,552 

5,10! 

1,700 3 

1949 

2,102 

6,266 

2,0087 

1950 

2,612 

7,909 

2,636 3 

1951 

3,195 

9.344 

3.114-7 

1952 

3,537 

10,299 

3,433-0 

1953 

3,567 



(Estimated j 




The moving average is calculated as follow* : 

The values for the first three year (1916, 47, 48) are added 
together and written in col. B against the mid-year 1947, This 
total is divided by three and the quotient written in col. C. 
Then the value of the fust year (1946) is dropped and the 
next three value* are summed. This sum is written in col. B 
against 1948. Then the value of the second year (1947) is 
dropped and the next three values arc totalled and averaged. 
27 






418 Alt IWTHODDCTtOH TO STATISTIC At METHODS 

Thu* this proem of totalling and averaging will continue till 
the series ends. 


The calculated moving averages recorded in col, C, when 
plotted on a graph (Fig. 16.1) will give us the trend of cement 
production in Indian Union, 



Fig. 16.1 


The device of moving average can be advantageously 
employed for removing variations of a periodic type. So this 
device is best suited for the data which is characterised by 
periodic movements, Under certain circumstances, this method 





the Affirm or tom* mns 419 

fully wipes off the periodic fluctuations and leaves the general 
trend of the data. But it n essential that the period selected 
for the moving average must coincide with the length of the 
cycle. If the former period differs from the latter, this method 
will not completely eliminate the cyclical movements* Very 
often the cycles would be of a uniform time durst km, 1 In such 
a situation the statistician is faced with the problem of selecting 
the proper period for calculating moving averages* It is 
suggested that in such cases we should take moving average 
period equal to or somewhat greater than the average period 
of the cycle in the data 

TABLE 16,2 

Trend estimation bv method of moving averages 


Year 

Annual 

Three-year 

Five-year 

Seven-year 

values 

moving 

moving 

moving 


average 

average 

average 

mi 

n 

260 

105 

163 0 



34 

124 

137-3 

152-4 


35 

163 

132 3 

136 6 

176 9 

36 

90 

*5L3 

174 6 

169-7 

37 

im 

188 7 

1918 

176 l 

:m 

295 

228 7 

1852 

1970 

39 

210 

218 3 

221 2 

I860 

40 

150 

210*0 

2090 

205 I 

41 

270 

180 0 

1920 

229*3 

42 

120 

200 0 

2200 

224*3 

43 

210 

226*7 

242 0 

229*7 

44 

350 

273-3 

237 6 

2654 

45 

260 

286*0 

293 6 

259*7 

46 

248 

302 7 

297-6 

271 8 

47 

400 

292*7 

268 6 

2947 

46 

23C 

278 3 

2906 

2806 

49 

205 

268*3 

291 6 

288 7 

50 

370 

2760 

274 6 

317 6 

51 

253 

312 7 

3186 


52 

315 

339 3 



53 

450 





1 The time-duration of a cycle isatso railed the periodicity of the cycle. 
Hew* the period of moving average mutt be equal to the periodicity of 
the cycle in the data. 




420 An mmonvcrwn to statistical methods 

Ctattriftf i Moving Average 

When the period selected for the computation of moving 
average comists of an odd number (3, 5, 7,9, II, etc*) of year* 
or months there is no problem of centering it The average 
obtained is written against the middle of the period as shown 
before But when the period consists of an even number (2, 4, 
6, 8, 10, etc,) of years, months or weeks there arises the problem 
of centering it. Thus in table 16.3 the average of four yean 
(1946, 47, 48 and 49) cannot be written against the second or 
the third year, since it falls between December 31, 1947 and 
January 1, 1948 and does not represent either 1917 or 1948. 

TABLE 16,3 


Production of cloth in India 1946-53 
Four-year moving average 


Year 

Output m 
million yards 

Four -year 
moving average 

Moving average 
c entered 

1946 

3,908 



1947 

3,762 

1,973-0 


1948 

4.318 


3,942'f'- 

1949 

3,904 

J,“l£ U 

3,990 5 

3,951-3 

1950 

3,664 

4.025*5 



4,060 5 


1951 

4.07f> 

4,323 0 

4,191 7 

1952 

4 , 59 a 



1953 

4,954 



f Estimated 

t 

i x y.... 



In order to get over this difficulty we adjust or shift these 
averages jo that they may coincide with years. This is done 
in the following manner : 


1 Compute the four-year average in the usual manner 
and place them in between two years -thus (he first 
average it written between years 1947 and 1948, the 
second between 1948 and 1949, and an on 





THE ANALYSIS 0f TIME SEHIK'S 


421 


2. Compute the two-year moving average of the averages 
obtained in step one, ami write against die middle of 
the four-year average. This will coincide them with 
years. Thus the first average will be written against 
1948, second against 1949 and so on 

The first and second four-year moving average* are added 
and divided b\ two and the quotient is recorded in the next 
column against the year 1948, This process of adding the 
pairs of moving average is termed as “centering the moving 

averagr”. 

The moving average method will give a correct picture of 
the genera) long run tendency of the data only under certain 
conditions. These conditions are ; 

\) The trend must be linear or approximately so. 

- Cvdiud variations affecting the data must be regular 
in their unit-duration and their amplitude. 

If the data contains the cyclical influences which are 
irregular in their periodicity and amplitude, the moving average 
method will not completely remove the cyclical influences and 
he ik:c it tan not display a good picture of the genet a 1 long-run 
movement- 

If the basic trend in the data is not linear, this method will 
produce a bias in the trend. As Waugh has pointed out, “If 
the trend line is concave upward (like the side of a bowl), the 
value uf the moving average will always be too high ; if the 
trend line h concave downward (like the side of a derby pot), 
the value of the moving average will always be too low.’* 

T he moving average method contains another disadvantage 
also, vie., it cannot Ik* extended to the extremes of the period 
in the data. While computing three-year moving average, we 
should have had to neglect one year on froth extremes If, on 
the other hand, we had computed five-year moving average, we 
should have had to neglect two years at both extremes. This 
type of defect can be avoided, if the moving average curve is 
extended both ways by free hand. 



422 AN INTRODUCTION TO STATISTICAL METHODS 
Yh# MttW of L«i«t Sq«aret 

Another method and perhaps the moat commonly employed 
*nd a very satisfactory one to describe the trend is by means of 
the objectively determined mathematical equation. The type 
of the mathematical equation that will be used in any particular 
time series will depend upon the type of the trend, for the 
trend may show a constant change in one direction or in more 
than one direction. Usually a linear mathematical equation 
Will be applied if there is a one direction change in the trend, 
and if otherwise a non-linear equation will Ik* used. In the 
former case, increase or decrease may be either by a constant 
absolute or a constant relative growth or decline, the fitted 
trend line may be a straight line. The straight line arithmetic 
trend will be used for a constant absolute amount of change 
and the geometric straight line it used for a constant percentage 
change In order to get straight line for both the absolute and 
relative changes, the trend values of the former should be. 
plotted on the natural scale and of the latter on the serm- 
logarithmic scale. But when the non-linear mathematical 
equation is used, the shape of the fitted line will be a curved 
one and not a straight line (which is) obtained by the appli¬ 
cation of linear mathematical expression 

Both the linear and non-linear equations Ik* long to the same 
family of simple polynomials or parabolas, formed of : 

1st degree parabola, and the mathematical equation 
thereof: 

r*»* 4 bX 

2nd degree parabola and equation thereof: 

r~*+kx+tx* 

3rd degree parabola and its equation : 

r~*+kx+cx*+jx' 

nth degree parabola ami its equation : 

r-a+MT- f<jr f +. rX* 

where V is the computed or trend value of the dependent vari¬ 
able, i.e,, of f serie;, X is the independent variable, Le M time 
unit of X series, and «, b t <*, d arc referred to as unknowns 




THE ANALYSIS OF TIME SERIES 


423 


(as their values are wot given in the series but are required to 
be determined), which are also called as constants, since once 
their values are determined for the concerned series, they do 
not change. 

The first degree parabola will give a straight line trend and 
the remaining ones will produce curved lines with varying 
bends depending upon the degree of the parabola. 

Ordinarily the line or the curve is said to fit the data best 
when the plotted dependent variables of the time series are at 
a minimum distance from the computed trend line. This is 
the guiding criterion to determine not only the type of line or 
curve and fit the data best but also to determine just where 
the line or the curve should be placed with reference to the 
values in the time series. 

The Principle of Least Squares 

The principle or the method of least square* provides a 
convenient device to obtain the straight line of best fit, This 
method can also be applied to the more complex trend types, 
vix., to the non-linear trend curves. A trend line computed by 
the method of least squares is such that the sum of the squares 
of the deviation between the original data, i.e., the dependent 
variables of the time series and the corresponding computed 
trend values are a minimum. If the curve fits the data per* 
fectly, each point will be on the curve and the sum of the 
squares of the deviations will be xero as each of the individual 
deviations and necessarily the sum of the deviations between 
the original and the corresponding computed values will be 
aero. The farther the points of the original data lie from the 
trend, the larger will be the magnitude of individual deviation 
(here also the sum of the deviations will be zero if the sum of 
the positive deviations will just equal the sum of the negative 
deviations and thus cancel each other}, although the curve 
may still be fit so that the sum of those squared deviations will 
be a minimum as compared to the sum of the squared devia¬ 
tions from any other straight Rite. This straight line of best 
fit it also called 'the least squares line* and gives an objective 



424 AN INTRODUCTION TO STATISTICAL METHODS 

fit to the trend of the series, the one which is best or at JcAit 
the most useful as against the subjective fitting of a number of 
straight lines, varying according to the persona) estimates in 
fitting the trend by the Free-hand Method. 

The use of the straight line trend on the natural scale 
means that whatever the short-tom ups and downs of the 
terries; the long-term drift of the data shows a constant increase 
or decrease from year to year, and the successive trend values 
are iir arithmetic progression. 

Normal Equation* 

The mathematical expression which is the simplest of the 
mathematical equations of the curves, for biting the arithmetic 
straight line is : 

As already stated, a and b are the unknowns and the task 
of obtaining an expression for the trend is simply that of deter¬ 
mining the values of the constants a and h in the equation. To 
compute the values of these two unknowns, it is necessary, 
following the well-known principle of algebra, to have two 
different equations involing a and b m that these may simul¬ 
taneously be solved. For this the mathematical theory of least 
squares gives the so-called 'normal equations' 

r r Na+bz\\ zxr««*rx t tsx* 

In these equations, T represents the dependent variblc in 
the time scries. A stands for the number of items and .V repre¬ 
sents unit of time. 

An easy way to work out for memorising these normal 
equations, is by writing down the general equation, with the 
original value of t instead of computed value T\ he M by 
writing Y <i4 bX 

Multiply each trim of the equation by I since the coefficient 
of a is 1 and let ail the values of this equation be summed up, 
the resulting equation will produce the first normal equation : 

r r r br\\ or x r- Ma+bzx 

(since the sum of a, if., Xa cqula* the value of a, .V times, 
be., Mm), 

By multiplying the equation ( Y^a+bX) through ^f, since 



THE ANALYSIS OF TIME SERIES 4$5 

the coefficient of the second unknown (A is ,V, we obtain 

Jtr**aX+6X'. 

By summing up these values, we obtain the second normal 
equation : XXY^aXX+bXXV 

By solving these two normal equations, the values of the two 
unknowns may now easily be determined and the trend line titled. 

Computing the Trend 

The application of the arithmetic straight line method lor 
the* computation of the hend for the sales in thousands of 
m* um.U of cereals in a certain store from 1950 to 1956, is illus¬ 
trated in Table 16.4. 

I he equation applied is rtf the type T' a-j t>A\ 

The two normal equations art : 

;■»» XT XartXX 
|n) XXr^aZX + bZX* 

In order to solve for a and A, the following values are 
necessary to obtain XX, XT , XX)\ XX* and A'. 

Since the numbering system for the years is simple arbitrary 
and further the number of years 1950, 1951, etc., the inconve¬ 
nient for catenating purposes, these numbers arc replaced by 
tin simpler consecutive numbers of 0, 1, 2, etc,, the year 
against which 0 is placed is known as the year of origin. 


TABLE 16.4 


Actual sales 

Product 


Trend of esti¬ 

Year 

X 

of cereal* 

of !'2) 

A' 2 

mated sales 



*000 tons 

and [Vj 


r,.,-4‘57l 



) 

XT 


+ '&57X 

' i ;• 

JlL 

(3) 

(*} 


.. .(«)..._ 


0 

- 

6 

1) 

4 571 f 1)57(0} «4-57 

1951 

i 

7 

7 

i 

4-571 f •857(1)^5*43 

1952 

2 

0 

12 

4 

4-5714 -857(2;«6-29 

1953 

3 

a 

24 

9 

4-571 4 -857(3)^714 

1954 

4 

9 

36 

16 

4-571 4- -857(4) »8(K> 

1955 

5 

7 

35 

25 

4-571 4 -857(5) »8‘86 

1956 

6 

10 

60 

36 

4-571 4- 857(6)»9 ?1 

A’. 7 

XX 

^ xr ’ 

xx r x 

Y 1 

Total of 


«21 

** 50 

**174 - 

91 

trend value** 50 


.Xrtt :• The above data are for 7 years and therefore ,V—7, 





426 AN INTRODUCTION TO STATISTICAL METHODS 

Substituting the value* in the normal equation* 

£T^M+t£X.,. .{i} 

I'Xt ^iIX^bl'X* .(ii). we get 

50»7e+21 b .(iiij 

J74«2U4-9ti.(tv) 

Multiplying the equation (iii) by 3, we get 

150-2 i«+63*. (v,i 

Deducting (v) from («v), we obtain 

24 — 281; or 281-24 ; or i ^ -* « 857 

Substituting the value of 6 in equation (iii}, we get 

V2 

5Q + 7a+ W; or 7a -32 ; or a - y - 4 57i 

Thus the equation of our belt line for the above serin is 
r-4*5714 0’857A" 

In interpreting the equation it is necessary to mention the 
year of origin, time unit and the unit of the dependent variable, 
*o that full information may be obtained. Thus, for the equa¬ 
tion just found : 

r~4'57lf 0 857.V 
Time unit : one year 
Year of or igin or rein point : 19SO 
Unit of dependent variable : thousand tons. 

With the help of the above solved equation, the trend value# 
for f series can be found just by substituting the appropriate 
values of X in the trend equation as shown in the 6th column 
of the table 16.4. It will be -noted that the aggregate value of 
the actual sales tallies with the total of tire trend estimates, 
though the individual values of the actuals and the trends 
differ. The reason it obvious. In the actuals, there may be 
ups and downs* but the trend deals with the overall growth or 
decline ami that is why we had that the trend values are conn* 
tacitly rising here though m the actuals we find even decline in 








TOE ANALYSIS OF TIME SEEIE9 427 

tike year 1952 and 1955, so there is on question of a uniform 
rise throughout the whole period. 

These trend values, if plotted on a graph paper, will give a 
straight line and that is why the equation with which these 
trend values have been computed, is known as straight line 
equation. In plotting these trend values, at least two or pre¬ 
ferably three (no need of all) of these trend values must be 
computed and joined by a straight line. 

Short Method of Arithmetic Straight Line 

In the table 16 4, the year 1950, the first year with which 
the X series starts, has been taken as the origin year, i.c«, the 
zero point, and to obtain the unknown constants, the solution 
of the two simultaneous normal equations ha* been somewhat 
kmgish. The arithmetic involved in these two simultaneous 
equations can be greatly simplified if the origin is shifted from 
the first or any other year to the middle, year of the time scries 
because the time unit of X series is uniformly spaced, he., it in¬ 
creases by a single unit from one year ( or half year , etc, ) to 
the next throughout the whole data, the sum of X'i will be aero 
{ 2*.V O) if zero point is placed exactly at the middle instead of 
first or any other year of the time series, and all the years pre¬ 
ceding this origin year are consecutively given the numbers 

1, 2, .3, etc., and so also all the years succeeding this 

origin year are consecutively designated as 1, 2, 3, etc. Here 
the values of SXT wilt also be smaller as the XT values for the 
years preceding the middle year will be negative, and the sum 
of which will be deducted from the positive sum of the XT 
values of the succeeding years. So also the values of LX* will 
considerably be reduced. 

Thus, placing the zero value at the middle year reduces 
the sum of the X value* to zero and consequently the solution 
of the normal equations is made very easy. For, if zero if put 

* XA7 wilt no* be im as JJCY it tbf sum of tb product of individual 

nhfacs of X and T and is not the product of %X and IF. 



428 AN INTRODUCTION TO STATISTICAL METHODS 


against the XX whenever occurring in the two normal equa¬ 
tions, these equations will become : 

XY^Ma-f-mn or XY*Xa 
£XT**«*{ 0 ) +k£X* or XXi ■ bEX* 

Now in each of these two simplified equations, there is only 
one unknown constant and st> the values of a and b 
may directly be found. Thus this simple or short-cut device 
considerably reduce* the labour and time involved in calcu¬ 
lations. 

The expression EY^Na may be written as 

EX 

This should be recognized as the formula for the arithmetic 
mean of the Y series, and so a is arithmetic mean of the Y 
series in this short-cut method. 

Similarly the expression EXT bXX t may be written as 


This value of h represents the average amount of change in 
the secular movement for one unit of time. As the trend values 
of T are obtained by substituting »br pertinent values of X in 
the equation T- a 4 bX it can be said that the trend line, fitted 
by straight line method is, in a sense, analogous to the arith¬ 
metic average, since the arithmetic average is a single value, 
rather than a senes of values, summarising a group of data and 
possessing the characteristics of -i the sum of deviations of the 
values, from the mean as /no, and h ; the sum of squares of 
all these deviations being Ins than the sum of the same from 
any other value. 

Computation of Trend by Short Method C^jd Number 
of Years 

Taking the original values of the actual sales for the yean 
1950-56 given in table 1(44, the application of the short-cut 
method may be illustrated in the following table : 



THE ANALYSIS OF TIME SERIES 


429 


TABLE 165 




Actual 



Trend in esti- 



sales of 

Product 


mated sales 

Yean 

X 

cereal* 

of 2 & 3 


r*«7-l4S 



{’OOQ tons) 

AT 

A** 

t ’857A* 



r 




(1) 

m 

(3; 

(4) 

(5^ 

(6) 

1950 

.3 

3 

9 

9 

7143 4 ’857{- 3) -4 57 

1951 

2 

7 

—14 

4 

7143 4 *857 (— 2} 543 

1952 

1 

(i 

-6 

1 

7* 1434 ‘857 { - 1) -6 29 

1953 

0 

R 

0 

0 

7*143 i ’857(0) r 7 J4 

1954 

1 

9 

9 

1 

7-1434 *857(1) - 8 00 

1955 

2 

1 

14 

1 

7 143 4 *857(2) -*8*86 

i*m 

3 

10 

30 

9 

7 143-f *857(3) -9 71 

.. X 

IX 

IT 

ivr 

IX* 


-■■■ 7 

■ 0 

_S0 

53 29 

“28 

Total of the 




- 24 


trend values ** 50 


Substituting the value* in the already arrived expifitiort : 

ir 

“ ~:.r> 

, zxr 

r A '« 

a- 5 7 °, or 7 143 

v "-» 57 

Thus the straight line equation for the alwwe series !>ei:runes 
r -7*143 4 857^ 

fime unit : One year 
Year of origin : 1953 

Unit of dependent variable : thousands of maumis. 


and 
we get 







430 AH INTRODUCTION TO gTATlSTICAI* METHOD® 


Looking »t the (rend values of both the above labia, we 
find them exactly identical and that is why the application of 
the short method is strongly favoured in the time scries, 

E*pU*atiofe of Csasttalt s mmd 6 

ft may l>e noted here that the value of b t alto known as 
the coefficient of X in the trend equation, as obtained by short 
method is identical to that computed by the longer method, as 
it should be. This coefficient is of particular interest since it 
shows by how much the trend rises in case of growth or falls if 
if is a declining trend from one unit of time (one year in the 
above illustrations) to the neat, ft is evident from the column 
6 of the above two tables that if the value of X is increased 
by one unit, the value of X increases by an amount equal to 
one times the value of b> Thus b measures the steepness 
or slope of the straight line. Its sign may be positive or 
negative. If it is positive, the trend will have a rising tendency, 
and if negative, a declining tendency will be observed in the 
trend. 

The value of a, however, is large (7’143) in the shorter 
method than that (4 571) of the longer method. This is 
because the origin, in the shorter method, has been shifted 
from 1950 to 1953. But this change in origin has no effect 
upon the trend values, as it is amply evident from both the 
above tables illustrating the application * of short and long 
methods separately, that the trend values for all the years, 
including, of course, the origin years of 1950 and 1953, are 
exactly identical. Thus by the use of short method the change 
is effected only tn the year of origin and not in the level or 
slope of the line. It may also be noted that for the year of 
origin, a is the trend value at when the second term of 

the general equation, he,, bX becomes aero. Thus a locates 
the general height of tlie trend and is frequently referred to as 
the ! intercept, i.c., the computed value of T when X equals 
aero. 



rm Aiunrsif or tihe mmim 431 

Cwiprtit ka if Tread by Skirt MetM^lm Nmktr 
of Year* 

The reader might ask how to apply the short-cut service of 
taking the origin at the middle year when X series deals with 
an even number of years, instead of odd number of years as 
used in the preceding illustration where the number of years is 
7 (1950 to 1956) and consequently no problem arises in locating 
the middle year at 1953. In a time series of even number of 
years, although there is no middle year opposite which can be 
assigned the value, A'«*9, yet the advantage of the short-cut 
device can still be had by taking the origin between the two 
middle yean (as the mid-point of the even series lies between 
its two centre years). The first year on either side of the 
origin will be assigned O’5, with the negative sign to the one 
preceding, and positive sign to the one succeeding, the origin 
year, The second year on either side of the origin will simi¬ 
larly be hum be red 15 and so on. The resulting sum of A" will 
he zero as in the odd number series. However, the decimal 
points may be cumbersome in their computations and come* 
quently to eliminate these fractions, the time unit of X is 
changed from one y ear to half a year. 

To illustrate the procedure, we take the data of the first six 
years given in the previous table. Two middle yean in 
table 16.6, are 1952 and 1953 and the zero j>oint will He at the 
centre of these years, viz., at the mid-night falling between 
31st December, 1952, and lit January, 1953. The data for 
1952 is located at 1st July of that year and lie* half a year 
before the zero point. But since Haif a year is being u*ed a* 
the time unit, so the X value for 1952 U assigned — I. For the 
year 1951, July 1st is 1J year*, before the. origin, and the 
lvalue becomes 3. Similarly the remaining X values, both 
preceding and succeeding the origin year, have accordingly 
been determined in the table 16.6. The value* of a and 4 
have been found and trend* have been computed exactly in 
the same fashion a* explained above. 



432 AH INTHODUCVlOH TO STATISTICAL METHODS 


TABLE 16.6 




Actual 

Product 


Trend or 



sale* of 

of 2 and 3 


Estimated 

Year 

X 

cereal* 
('000 ton*) 

XT 

X t 

Sales, i.c.. 



r 



r^G6+4.V 

0) 

(2) 

(3) 

(4) 

(5) 

(6) 

1950 

—5 

3 

- 15 

25 

6-6 i -4(- 5) - 4'6fj 

1951 

.3 

7 

-21 

9 

6*6 f *4 ( — 3) ~ 546 

1952 

-.1 

6 

-6 

1 

6 6 -4(-l)«6-26 

1953 

+ 1 

8 

48 

1 

6-6 + -4f4l)»7'0f) 

1954 

4 3 

9 

4 27 

9 

6-6 4--4( f 3)- 7 86 

1955 

4 5 

7 

4 35 

23 

6 6 f-4(4 5) - 84>6 

•V“- 6 

£X 

TAT 

zxr 

2\Y« 

Total of the 


rO 

40 

470 - 42 

70 

trend values 40 


-28 

Substituting the values in the expressions : 

zr 


and 

h 

XXV 
V v* 



we get 

a 

40 

6 

20 

fi 07 


b- 

28 

7u 

5 

-4 


Tlu» the straight Our equation for the al»ovr even series 
becomes 

f 6*67 + 4X 
Time unit : Half st year 
Year of origin : mid 1952-53 
Unit* of dependent variable : thousands of rnaunds. 








433 


THE ANALYSIS OF TIME SERIES 
ScmMiyrairic or OmurHc Straight Liar Tread 

The straight line T*™o+bX which has been illustrated m 
far describes a constant amount of growth or decline. Many 
economic and business series, like the compound interest data* 
tend to show a constant rate of change, instead of a constant 
absolute amount of change from year to year. Such a trend is 
observed when the values of X series are arranged in arithmetic 
progression, such as 1, 2* 3, 4, etc., and the data of the T series 
are in geometric progression, such as I, 2. 4, B, 16, etc., show¬ 
ing these values change at a constant ratio, viz.* the tendency 
of the absolute amount to change more rapidly in succeeding 
years than in preceding years. If the trend values of such T 
series are fitted on the natural number scale a linear type of 
trend will not appear as these values are in geometric progres¬ 
sion rather than in arithmetic progression. But when the 
same is fitted on a semi-logarithmic chart (the A-ax is of which 
is on a natural number scale and the /-axis is with a logarith¬ 
mic ruling!, a linear type of trend makes its appearance. This 
means that there is a straight line relationship between the 
natural values of A’ series and the logarithmic values of /series. 
This line is called geometric straight line or semMogarifh* 
mic curve. 

When the trend is geometric, the resulting curve is often 
referred to as the exponential curve as the formula of this 
geometric straight line trend h 

r^ah* 

(It is referred to as the exponential curve of function 
because the A' factor is used as a power.) 

When expressed logarithmically, it takes the form : 
log /' ^log a-i X log b 

To compute the specific values of a and H the method 
applied is similar to that of the arithmetic straight line, 
T* bX x considered above. The only difference to be noted 
is that the logarithms of the original /series (obtained by 
reducing all the T values to their appropriate log* values), 
instead of their natural numbers, are used throughout. 

20 



434 AW INTRODUCTION TO STATISTICAL METHODS 


The normat equations, therefore, become : 

£ log Y^N log a-ftag b£X 
£X log r ~ log alX 4-log b£X* 

Or when the middle year is the origin, the expressions become 


log 0 


E log T 

' N 


It give* the logarithmic value* of trend at the origin, which 
it alto the logarithm of the Geometric mean of the T series. 


log I** 


ex log r 

~TT*. 


It is the logarithm of the rale of change per unit of time, 
i.c., the slope of the line in logarithmic term*. 

To arrive at the natural numbers, these logs should be 
converted into antilogs. If, in the curve, Y' ah*, b is a posi¬ 
tive number greater than one the trend will be upward and the 
amount of change will indicate a constant percentage of 
increase and if h it less than one though a punitive number, the 
trend will be downward and the amount of change will show a 
constant percentage of decline. To get the figures of the 
annual rate of increase, multiply the value of b by 100 and 
subtract 100 from it, the resultant will be the percentage rate 
of growth per year. If the stated percentage of the antilog of 
b happens to be less than 100 then the same is subtracted from 
100 and the resulting figure will give the yearly percentage rate 
of dec line. 

It may be observed that geometric straight line, too, is 
a least squares fit to the logarithms of the T values, though 
not directly to the l values. Here the sum of the squares of 
the deviations between the logarithms of the original values 
and the logarithmic trend values, Tflog T-log F)* will be a 
minimum. 


Logarithmic Straight Line 

There may be instances where a geometric progression if 
formed by the Y values when the X values are genmetri- 



THE ANALYSIS OP TIME SERIES 


435 

catty arranged, and the aemi log chart may not give a linear 
type trend. For these data, the logarithmic paper (with log* 
arithmic rulings on both the axes of X and Y ) should be used, 
to fir a linear type trend by the method of least squares. 

In terms of X and T } the exponential function for these 
geometric progressions will be fT**aX*. 

When this function is transformed in terms of logs* tt 
becomes log f - log a+b log X t which is a straight line in 
terms of log X and log Y , It may be noted here that the 
origin of X (original values of X when changed to logarithmic 
values} cannot be taken at the middle of the period. 

Non-Linear Trends -Second Degree Curve 

A linear trend may provide a reasonably good description 
of the trend ot short period series of about a decade showing 
an increase or decrease throughout, but a non-linear trend may 
mere appropriately be fitted to the series for longer periods of 
about a quarter or half a century, showing a turn from early 
years 1 rife to a fall m later years or exhibiting the turning 
points into the stages of early expansion, decline and a final 
attempt at recovery. 

In order to represent a non-linear trend, polynomials of 
degrees higher than the first (the linear equation, *bX t is 

referred to as a polynomial of first degree as it involves only the 
first power of X) are used. The higher-degree polynomials are 
obtained by adding further terms to the linear equation invol* 
ing powers of A*, higher than the first. If, foe example, second 
power of X r i.c,. A' 1 is added in the ahoveequation we will 
obtain the formula of the second-degree polynomial or parabola 

r***+bx+cx* 

The method of fitting a curve of this type will also resemble 
the one applied in fitting the linear trend. A* in the linear 
trend, so also in evolving the equation of serond (or higher) 
degree parabola, the principle of least squares is applied, 
vi/ v , the condition that sum of the squared deviations of 
the curve selected shall be smaller than the s im of the same 



436 AN INTRODUCTION TO STATISTICAL METHODS 

from toy other second power curve fitted to the tame data* 
is laid down. Bui fince there arc three unknown* a, b and c 
in the equation, Y'^a + bX+cX* t three normal equal tons art 
required to obtain the values of these unknowns. The process 
of getting these three normal equations descried below, is 
very simitar to that of the straight line and may also be easily 
memorised. 

First, write the ‘type’ equation, with the original value of 
Y instead of computed value Y\ i,e. t by writing dow n 
r^akbX+tX* 

Neat, multiply each term of the equation by the co-efficient 
of each unknown constant (a, b, t) and sum up : 

(*) For first normal equation, multiply the type equation 
by the coefficient of a (which is 1) and get the sum, and the 
resulting equation will be of the form : 

Zr- r Ya4bZX+cZ'X* 

(it) For the second normal equation, multiply the type 
equation by the coefficient of b (which is X ) and sum up, the 
form thereof will be 

ZXY^aZX + bZ A'U «'Z\Y* 

(iii) For the third normal equation, multiply the type 
equation by the coefficient of c (which is A and sum tip, 
thereby getting 

ZX* Y ** aZX *4 bZ X*cZX % 

In this way, we get three normal cquat ons : 

ZTXarbZXrcZX* 

ZXr aZX f blW + rZX* 

ZX* r*- *Z A * 4 bZX 3 -f <ZX* 

But substituting the appropriate values in these normal 
equations, the desired values of the unknown constants ran be 
obtained by the simultaneous solution of the equations and the 
curve can now easily be fitted. 

This method of computing the value of a, h and t can be 
further simplified as the solution of the three simultaneous 
normal equations would generally involve considerable amount 
of labour. As in the linear trend, here also the short-cut 
device may be used by taking the aero value of A' in live middle 



43? 


THE ANALYSIS OF TIME SERIES 


of the period. By taking the origin at the middle of the data, 
not only the sum of X (SX) but also the sum of any odd power 
of X , such as EX*, EX*, etc., becomes zero, and therefore, by 
putting 0 for EX and EX* (litre the highest odd power is EX*) 
in the above three normal equations, we get 

rr«.Va+*(0) 4 cZX* or ZY**Xa + cZX* 


ZXr**a{ 0) 4 bl'X* 4c (0) or 


EXT-bEX * or b 


zxr 

TS? 


EX'r^aSX'+bm+cEX* or EX § r~*EX*+cEX* 

Here the value of b is obtained directly, and the values of a 
and c will be obtained by the two simultanous equations, and 
consequently the labour is considerably saved in solving the 
three simultaneous equations. 


Computation of Trend by Second Degree Parabola 


In the tabic 16.7, we have taken a simple data of a few years 
just to illustrate the procedure of calculating the trend values 
for fitting a second degree curve by the short-cut method. 
Substituting the values in the normal equations. 


Er^Va if EX* .(») 

EXT^bZX* . (n) 

EX'r^aZX* 1 cEX* .(iii) 

We get 

100's*7a428t... ..fiv) 

46 •••- 284 or b ^ «164 . <v) 

334==28a *- 196c....(vi) 

Multiplying the equation (iv) by 4, we get 

400 - 28a4 112c . (vii) 

Deducting the (vi) from (vii), we obtain 

66-w — 84f, or *- - — -£• .*79 


Substituting the value of t in equation (iv), 
we get 

l(X) 7ai 28( )or 100 7a-22 

or 7fl«» 122 or a ** 17 43 









438 AN INTRODUCTION TO STATISTICAL METHODS 


d 

H 


.Sc 

ti T 
W \£t 

H 2. 

SZ 

E — 

a / 1 


X *s 
in 


X p* 

TM 


Ok 


r ** ^ 

J! 5 


v o» 


3 

«n 

I! 

ST 


© «n ao m ** 

ffi C f W »p CM 

© »a r** «? *■"* *n 

7 77 7 7 7 

S 3 S' ^ g£ 

Chi Ch 9V Chi- £7"; Cv 

K f >» |H tN t •* 

) *! i ! 

"4 ex" 


'-0 

4, 


I 

T t ♦ t f 

f£) <D © i£> US 

4 7 7 7 4 

mmcn.tr> c*> 


f" f*» fx jx fs rs 


— iO — O 
oc — 


rx CC- — O 

04 

I i I 


© — 
— 0 D 


CQ I'-' 
<N 


tn ^ to o fir* co *n 

r£ fh mi — yj o 1 *! 


ov ■*• — © — -r <n 


— eo ws 
ex 

I I ! 


cr> ♦ «.c 

- « t 


rx cr> © Ch in 


n ex — 

( : I 


5 - « «h 




rr 2'AT XX 1 TV*) r.V* r,V* Tool of thf trend values ° 99 B9 

100 - 98 28 ■-- 334 «-0 196 or c orrected to tv,o decimal-100 

— 52 (because of approximations) 





tm ANALYSIS Of TIME SERIES 439 

Thus the respective values of the constants a, b and c are 
17*43, 1*64 and ~~*79. 

The type equation of second degree parabola becomes 
17*43+164 X-0 79X* 

The second degree parabola curve can now be fitted by 
plotting the computed trend values of column (9) of table 16.7. 
Here, all the values be plotted, and not merely two or three 
values as in the case of straight line trend, for the simple 
reason that the second degree equation will not give a straight, 
but a curved line and consequently the desired curve can be 
obtained only when all, or at least most of, the trend values are 
plotted. 

From column 9 of table 16.7, it is evident that the value of 
the constant a in this type of equation is the value of X at the 
origin, the value of the constant b indicates the slope at the 
point where X^O (at the origin) ; and the value of the constant 
c determines the extent of departure from linearity and it also 
indicates whether the curve is concave or convex to the base. 
When the value of c is positive, the curve will be convex, and 
when negative, as we find in the above illustration, the curve 
will be concave to the base. 

Another point that can be noted from observing the trend 
values that in the initial years there is a rising trend and 
then a change is made from the rising trend, i.e., it begins to 
show a declining trend. Consequently when these will be 
fitted, there will be one bend in the trend as the second degree 
parabola permits the trend to change direction once, viz., from 
rising phase to a decline one, or vice verses depending on the 
sign of e. Thus, it can be said that second degree parabola 
is a flexible curve as it permits a change in direction and 
this is the reason why this curve is fitted in long period series 
than the linear curve which allows no flexibility. 

Third Degree Carve 

The third degree parabola F' ~ a 4 AA'+ rA'M dX* h obtained 
by adding one more unknown constant d (or raising A' by one 
more power) to the type equation of the second degree para- 



440 AN INTRODUCTION TO STATISTICAL METHODS 

bolt, and permits the fitted curve to change direction twice, 
viz., allows two bends in the trend curve. A still more flexible 
curve may be obtained by using higher degree parabolas than 
the third degree, and the general formula for the type equation 
will be 

r - a~ bX + cX* rdx* + #*»+...etc. 

The solutions for parabolas of third or higher degree can be 
arrived at by the extension of the method explained for the 
parabolas of first and second degrees. 

It may be indicated at this stage that the parabolas of higher 
degree should be used more cautiously-as the parabolas of 
fourth, fifth degree etc., will respectively have 3, 4, etc., bends 
or changes of directions (viz., several ups and downs) in the 
fitted curve, which may hardly coincide with the concept of 
secular trend. 

CUsnlaatian of Trend 

As stated earlier the forces affecting time scries arc broadly 
divided into two categories, viz., long-term forces and short¬ 
term forces. If we arc interested in the long-term forces we 
will study only the trend. Hut when it is desired to study 
short-term movement# we w ill have to get rid of the trend from 
the original data, i f., we will have to eliminate trend from the 
actual data The task of eliminating is easy if trend figures 
are known to us The process involves the deduction of trend 
values from the actual figures. Reverting to table 16.2 and 
assuming five-yearly moving average, vve find that the devia¬ 
tions from the trend for various years are : 


Year 

Deviation from the trend 

1944 

f 112 4 

1945 

- 33 6 

1946 

- 49 6 

194 7 

4 131-4 

1946 

- 60‘6 

1949 

- 85 6 

1950 

4 95*4 

195! 

- 65*6 






THIS ANALYSIS OF TIME SERIES 441 

The above figure* show the amount by whith the actual 
figures differed from the general long-run tendency. The 
deviations recorded above arc taken with their proper algebraic 
signs and when plotted on a chart show the short-term influ¬ 
ences existing in the data. 



Fig. 16.2 

Method of measuring Seasonal Fluctuations 

The following arc the different methods of measuring 
seasonal variations *. 

(a) Simple average method, 

(b) Simple averages corrected for trend, 

(cj Link relative method, and 

(d) Moving average method. 

(a) Simfik Avtragt \Uthod. Under this method (if we 
have monthly data for a aeries of years) a typical figure for 
each month is obtained. The steps in its computation arc : 

1. Average the values for each month for all the years. 






TABLE 16.8 

Seasonal indices by the simple average method 


442 


AN INTRODUCTION TO STATISTICAL ME I BODS 


js a 

ITS 

rs 

3! 

CM 

rO 

fio 

CO 

to 

rs. 

X 

8 

s 

rs 

i 

rs 

© 

s 


8 u 


& 

£ 

*n 

fs 

co 

XI 

*h 

© 

cm 

0S 

X 

OV 

CM 

© 

i 

s 

i 

8 

a 


is. 


tO 

* 


M 

VO 

<n 

8 

■ © 

CM 

CM 

8 


■Hh 


to 



W» 

r*s 

CM 

NT 

CM 



— V OS 

! 

© 

as 

<5 




CM 

2 

rs 

cr 

X' 

CM 













**' 

— 

i 

2 

♦n 

04 

Is 

$ 

CO 

ce 

Of 

X 

**f 

X 

X 

oc 

© 

CO 

X 

725 

© 

s 

«n 

c> 

« 

> 

o 


8 

8 

so 

fs 

© 

s 

r~ 

© 

Cm 

rs 

o 

* 

X 

CO 

CO 

CO 










AW 


X 

© 

o 














—* 

X 


Is 

c» 

s 

IT: 

c> 

$ 

s 

S' 

r 

CM 

- 

so 

*r 

CO 

»n 

is. 

w 











© 

© 


£, 

<a> 

* 

rs 


on 

CO 

Nf" 

r- 

i n 

O 



© 

a 

- 

© 

00 

Ob 

o 

- 

H 

- 

- 

a 


CM 

bh 

os 

C> 

© 


fs 

NT 

1— 

« 

O 

, 

X 

© 

Of 

n3 

00 


00 

X 

C3 

© 

o 


w’ 

Is 

fs 

X 







“* 





X 

© 

2 

>* 

ac 

wo 

* 

*r> 

CM 

X 

©> 

X 

NT 

fs 

X 

© 

X 

3 

-3 

oo 


CO 

or. 




Cl 

X 


I 

4> 














C 

a 

r- 

a 

'00 

CO 

so 

fs 

s 



CO 

Ci 

CM 

© 

s 

2 

IS. 

© 












X 


B 















* 

35 

© 

«n 

rs 

a 

V'i 

rs 

CO 

«S 

rs 

© 

©. 

X 

* 

CO 

2 

CM 

801 

SS 

rs 

«g 

C> 

April 

«T> 

j ^, 

<n 

© 

CM 

3 

r^ 


tn 

f-s 

50 

rs 

X 

■ rs, 

j 

is 

fs 

r^* 

X 

Ci 

© 

© 

U-) 

fs 

CO ' 

X 

' 04 

Ci 

-C 


! 












u 














•"* 














cc 

Nt* 

. o 

NT 

s 

CM 

sMI 

i— 

C4 

vp 

X' 

4* 

a 

4T 

tM 

<« 


■ © 

cO 


© 

cn 

© 

© 

© 

rs 

rs 

%Tt 













J © 

jb 

V 

a. 

CO 

68 

rs 

fs 

$ 

40 

fs 

X 

06 

S 

Si 

*r- 

739 

cm ; 

9 



! 










X ! 

i © 

( 

C 

« 

CM 

! CM 

CM 

o 

O 

«*■ 

uO 

CM 

s 

<*• 

’ — 

CO j 

4,«o : 
bpCM 

i z 

"“O 


1 rs 

i 

CO 

fs 

fs 

X 

X 

© 

Cl 

*4* 

r«* 

! 15 S 

8 


" © 
«o 

!2J 

i 

5 

Nf 

CM 

Nf 

CO 

+p 

o 

«*■ 

$ 

rs 

.. 1* 
3 

r 

! £ *< 

>* 










H 

< i 

i 5 « 

v5 •"* 




THE ANALYSIS Of TIME SERIES 443 


The averages for different months arc given just below the 
totals for every month irt table 16,8 


2. Compute monthly indices in the following manner : 

Average of the month X 1,200 
Total of the monthly averages 


Thus the seasonal index for the month of January is 

82 33 > ! ,200 ni , a r r . 823 1 x I,200 n t 

1,082 22 - 913, for February , -91 


Thus column 12 gives indices of monthly fluctuation of the 
variable under consideration. This is quite a simple method* 
But the indices so obtained do not truly represent the seasonal 
variations inasmuch as they include the trend influences also. 
This method can, therefore, be usefully employed only if the 
original data has no long-term tendencies. 

(bi Simple Aitragn Corrected Jm Trend. In this method the 
trend is eliminated from the monthly averages and thus the 
main shortcoming of the previous method is removed. The 
main steps involved are : 

1. Calculate the trend. In thr table 16.9 given on pages 
444-445 wr have computed the trend by the method of 
least squares, ft gives us an yearly increase of 3*01, This 
figure divided by 12 gives us a monthly increase of *25. 

2. Calculate the mean of monthly averages. In our 
illustration it is 90'19, 

3. This average (computed according to rule (2) 90*19) 
is the trend value of the middle point of this series. In our 
illustration this is the value on 1st July. 

4. In order to compute the trend value for June (15th) 

25 

0 is deducted from 90’19. Thus we get 90*065. To cum- 


putc for July (15th) we add ^ “’125 to 90*19 and obtain 



444 AH INTRODUCTION TO STATISTICAL METHODS 


TABLE 

Seasonal Variations (Simple 
Monthly data 


Year 

Jan. 

Feb. 

March 

April 

May 

June 

July 

August 

1 

2 

3 

4 

5 

6 

7 

8 

9 

1939 

72 

68 

69 

71 

75 

80 

85 

89 

1940 

82 

77 

84 

73 

86 

86 

90 

% 

1941 

70 

68 

VJ 

70 

75 

76 

75 

81 

1942 

70 

76 

82 

72 

78 

84 

82 

87 

1943 

84 

84 

91 

94 

97 

101 

98 

104 

1944 

85 

90 

91 

87 

89 

90 

89 

97 

1945 

92 

90 

92 

94 

% 

98 

98 

108 

1946 

92 

91 

96 

95 

103 

102 

104 

no 

1947 

94 

95 

100 

97 

102 

99 

97 

106 

Total 

741 

739 

774 

753 

801 

816 

818 

878 

Average# 82*33 82 11 

86 00 

83 67 89 00 

90 67 90 89 

97*56 

Trend 88 815 

89 065 89 315 89 565 89 815 90-065 90 315 90 565 

Seasonal 









Index 

92 7 

92-2 

96 3 

93 4 

99 1 

100 7 

100*6 

107*7 


Let the trend be l^a-v b.X. Normal Equations are 


81 and 4*23901-.- 


Yearly Change in averages 3 01 









THE ANALYSIS OF TIME SERIES 


445 


16.9 

Average Method adjusted for trend 
from 1939 to 1947 


Sept 

Oct, 

Nov, 

Dec. 

Total 

Average 

X 

*y 

** 






y 




io 

M 

12 

13 

H 

15 

16 

17 

18 

% 

97 

80 

75 

957 

79*75 

1 

79*75 

1 

97 

100 

88 

72 

1,031 

85*92 

2 

171*84 

4 

ft 4 

93 

76 

6H 

905 

75*42 

3 

226*26 

9 

93 

95) 

94 

83 

1,000 

83*33 

4 

333*32 

16 

103 

}0fl 

98 

82 

1,144 

95*33 

5 

476*65 

25 

104 

109 

97 

84 

1,112 

92'67 

6 

566 02 

36 

107 

HI 

102 

88 

1,176 

98*00 

7 

686*00 

49 

115 

120 

107 

90 

1,225 

102*08 

ft 

81 tv 64 

64 

110 

HI 

96 

83 

1,190 

99*17 

9 

892 53 

81 

909 

94 8 

838 

725 

9,7 M) 

811 67 


4,239-01 

285 

101 '(H) 105*33 93 11 

HO 56 

1,082-22 90 19 




90*815 

91 '063 

91315 91-.%'- 





Ul'2 

115 67 

1020 

88 0 







scrJV^H- hl'x. £ v •**£*+ hi' a* 
43# \ 285k Solving these h -301 


Monthly rhanee — r? -r '25, 







446 AN INTRODUCTION TO STATISTICAL METHODS 


The method of calculating trend values for other months 
has already been explained, 

5, The seasonal indices have been obtained by finding the 
percentage of monthly average to the secular trend. Thus for 
January the seasonal index is : 

J??™? , v i oo 92 * 7 
88*815 °° 321 

(c) Link Relative Method, This is the most satisfactory 
method, developed by Prof. Pearson. This method, also railed 
the “Pearson’s Method”, expresses each monthly figure as a 
relative of the immediately preceding month. The seasonal 
pattern is found by averaging all the link relatives for the same 
month and taking residual trend out of the chain relatives 
computed from these average link relatives. The steps involved 
under this method are 

1. Translate the original data into link relatives, expressing 
each monthly figure as a percentage of the figure for the pre¬ 
ceding month. Link relative for the month of April 1949 will 
be found as : 

Figure of April 1949 . 

Figure of March 1949 

2. Sort out the link relatives by months and obtain the 
mean or median (median will be easier to locate) link relative 
for each month. 

3. Convert the series of median link relatives into “series of 
chain relatives”, The first chain relative (for January) must be 
fixed at 100. The chain it la dvr of any month is obtained by 
multiplying the link relative of that month by the chain relative 
of the next preceding month, and dividing by 100. The process 
is continued until we obtain chain relatives for all the 12 
months and for January second time also. The chain relative 
for January was assumed to be 100, The second chain relative 
for January will be— 

Chain relative for December v link relative for January 

m ' ' . 

.. 

4. The last chain relative (for January second time} mutt 



tabu; 16, iu 

Seasonal Vatiaiiou - Linked Relative Mrilwi 
Data Used is of Table 16,6 
!.inked HdaStPts 


THE ANALYSIS OF TIME SEB1ES 


447 








eo 



c 

« 






© 

00 











w 

» in w n 

© CO 

— 



cp 

— 

© 


S S i S 

s $ 

3 

Jg" 

© 

3 

h* 

0 


8 

> 

o C' n 

o 9 

CM 

to 

— 

9 

tn 

0> 

c 

Z 

00 Sfl 8 

Cfc CM 
CO 01 

8 

© 

00 

SB 

w 


1 


M Cl 

CO 00 

CO 

p— 

X) 

Mh 


N 

Oci 

g o g j 

M- cn 

© © 

<* 

© 

8 

j 

© 

CN 

in 

CM 

uO 


— ^ o> o 

CM —* 

© 

X 

x 

X 

O 

© 

jh 

^ n i j 
© © © 0* 

r*. ci 

O 05 

4 * 

© 

CO 

O' 

on 

© 

© 

CM 

05 

© 


—• *™ • 

"*■* 

—* 

“■* 





* 

< 

^ C _ 

C CM 

X 

CM 

Mh 


© 

© 

<£> op x> is 

Ci- 1 C? '-^ ; ' 

05 O 
O 1 -— 

*c> 

0. 

r> 

•sb 

tin 

| 

>- 

iC I-* © c 

Cl 

© 


X 

'it* 

X 


3 

sc f" *" 
O 0> 0s cr 

jr' S 

CM 


00 

0, 

© 


8 


—0 


“■* 



*—« 



c 

1 — sc — 

— 

M, 


9 

C» 


Is 

c 

3 

1 £ § £ 5 

~ CM 

o © 

05 

C 

© 

&> 

8 

S 

MJ T " 










; cS cm eo 

ro — 

•M- 

CM 

CM 

o 

9 

CM 

« 

■ ; i - x cr 

cm r>* 

ac 

up 

<© 

0‘ 

X 

8 

Ji 

! — — ~ ~ 

© o 

2 

~~ 

© 

Z. 




■ e> Ip X ep 

© CM 

© 





*o 

eL 

< 

i w& — 6 co 
- sc o ce O' 

OS 

102 

0 

0: 

i-^ 

0 

0 

C;i 

© 

$ 

*C 

I — tO Cl fp 

— M 

in 

CM 

CO 

x 

© 


w. 

ins 

* dv « oo 

— CM 

*o 

tr. 

>'r; 

j 

p- 

* 


;£ O 3 C 

■o c 

© 

© 

w 


j2 

0 

5 ? 

| 








, 

] C> —• *o 

C 50 

0 


juC, 

ITS 


ip 


: ; «*5 r*. OC O 
{ Cl Ci © C 

tO <-■* 

© C> 

3 

3 T 

S? 

0 

0 

05 

0 

0> 


1 *“ 

— 








O r< 

*o 

© 


— 



cn 

c 

« 

| 0\ 

;0 m o o 

ro 05 
© © 

s 

$ 

s 

8 

8 

Si 





'"■■* 



pm 



| 

*s i © — eo -f »n *&■ e* 

A* i o> sfc cPi 0 * 0 ch o d 
----^ 


> £ 


£ 

. c 


.s 6'C S,g3 5 K 

1 11 I’ll si 

^ Off* *C O Dfi -6ft *»s 


© 

ii 

& 

03 

w 

V 

> 

< 






448 AN INTRODUCTION TO STATISTICAL METHODS 

be 100, But this would not be usually so due to the pretence 
of the element of trend. The difference between the two chain 
relatives for January will represent the trend increment or 
decrement. It is necessary to adjust the chain relatives for the 
effect of the trend. If the last chain relative is greater than 
100, the correction factor is to be deducted ; if it is less than 
100. it is to be added. The first month is kept at 100. But 
neat month the correction factors should be added or subtracted 
as the case may 1 m*. If wt have monthly data the correction 
factor for the second month would be 1 12 of the amount by 
which the final chain relative differs from 100. The correction 
factor for the third month would I e 2/12 of this amount, then 
3/12, 4/12 and so on. In our illustration correction factor for 

13 

March is > 2. 

5 The last step is to express the corrected relatives as 
percentage of their arithmetic mean. This is done by dividing 
each one of the chain relative by 1.12 of the total of twelve 
items (adjusted chain relatives and multiplying by 100. These 
are the adjusted monthh indices of seasonal variation. The 
percentages thus obtained can be plotted on a chart showing 
dtagrammaticallv the seasonal influence. 

(d) Moving Aim age AJehad. For finding an index of 
seasonal variation based on twelve-month moving average 
method, the following steps aic to be followed : 

1, Compute a twelve-month moving total of the original 
data. The total lor the months July 1939 u» June 1940 is 
recorded in between December 193'* and January 1.940, that 
of August 1939 to July 1940 tn between January and February 
1940. These two totals ate then added and divided bv 24. 
The result is written against January. The process U continued 
till we get the moving average for December 1946. The aver* 
ages so obtained are shown in Table 16. U- 

2, Divide each original figure be the corresponding moving 
average. State the remit as a percentage. (Sec table 16.12) 

3, Find out the mean percentage for each month. 



THE A!f AJLYSI* Of TIME BlftlKft 449 

TABLE 16. H 


Sdimal VumtioTi—Moving Average Method— 1 2 Month 
Moving Averages (Data used of Table 16,0) 



1040 

1941 

1942 

1943 

1944 

1945 

1046 

Jan. 

84*4 

80*4 

78*5 

91l6 

93 4 

954 

99*5 

Feb. 

84 9 

79 l 

79*1 

93*0 

926 

96*3 

99*8 

March 

852 

77*9 

79 7 

94*1 

92*3 

96*8 

100*3 

April 

85*4 

77*1 

80*4 

94*9 

92*4 

97*0 

101*0 

May 

85*9 

76*3 

81 4 

95*4 

924 

97*3 

101-6 

June 

86*1 

75*6 

82*7 

95*6 

92 4 

97 7 

101*9 

July 

85*5 

75*4 

83*9 

95*6 

92 8 

97*8 

1020 

August 

846 

75*8 

85*0 

958 

93* i 

979 

102*3 

Sept. 

83-8 

76 7 

85*9 

96*0 

93*2 

98*1 

102*6 

Oct, 

82 8 

77 3 

87! 

95*7 

935 

983 

102*9 

Nov., 

82*3 

77*5 

88 7 

950 

94 1 

98*6 

102*9 

Dec. 

81'4 

77*9 

902 

94*3 

94 7 

98 1 

102*8 


TABLE 16.12 

Ratios to Moving Averages {Percentage y 


1*8 I 


•u. 


Months 1940 1941 1942 1943 1944 1945 1946 

1 2 3 4 5 6 7 


fifll 

ililSS 


J*n. 97-2 87 I 89-2 91-7 911 96-4 925 92 2 921 

Feb. 91 2 85-9 96 1 90 3 97-2 93-5 91 2 92-2 92 1 

Mnr. 98-6 87-3 102-9 96-7 98 6 93-0 95-7 96 4 96-3 

April 85-5 97 9 89 6 99 1 94 2 96 9 94-1 93-9 938 

M»y 100-1 98-3 95 8 101 7 96 3 98 7 101 4 98 9 96*8 

June 99-9 100 5 101 6 105-7 97-4 100 3 100-1 100-8 100-7 

July 105-3 99 5 97 7 102-3 95-9 100 2 101-9 100-4 1003 

Au*. 113-5 106-9 102-4 108-6 104-2 110 3 107 5 107-6 107-3 

Sep. 115 8 109-6 108-3 107-3 Hl-6 1091 U2-| 1105 110-4 

Oct. 120-8 120-3 113-7 112-9 1166 112-9 116-6 1163 116-2 

Nov. 106-9 98*1 105-9 103-2 103-8 103-5 103-9 103-3 1034 

Dec. 88-3 87-3 92-0 87-0 88 9 89 7 87*6 88 7 88-3 






450 AN INTRODUCTION TO STATISTICAL METHODS 

4. Express thru? mean percentage* as a percentage of 
their own average. This is the index of seasonal variation. 
(See column, 9, Table 16,12) 

This method has been recommended at the most satis* 
factory method of estimating a typical seasonal index. This 
method irons out the influences of trend and cyclical fluctua¬ 
tions from the index of seasonal variation, Wr should, 
however, remember that n moving average technique can be 
successfully employed if the data contains the variations of 
uniform periodicity and amplitudes, and seasonal variation is 
a regular one. 

This method is also termed as “ratio to moving average 
method** because the second step in its estimation is to divide 
the original figures by the corresponding moving average. By 
employing the second step, in fact, we have divided the raw 
data by trend and cycle ami thereby have left in the percentages 
the seasonal arid erratic variations, 

The object of step 3 is to eliminate from these percentages 
the influences of erratic causes. 

Cyclical Fluctuations 

If we divide the actual values by their respective trend 
value and seasonal indices the result will represent the percen¬ 
tage cyclical fluctuations month by month. Thus the steps 
involved in its computation are : 

I , Calculate monthly trend value by the moving average 
method. 

2. Calculate monthly seasonal indices preferably by the 
moving average method. 

3. Find the percentage ratios of the actual items to the 
trend values (dividing the actual by trend value and multi¬ 
plying by hundred). 

4. Divide the value obtained under step 3 by the seasonal 
index. 

The mult will be the percentage cyclical fluctuation when 
the data doe* not contain any random fluctuations* 



tub aju&t*ss or rmm nmtm 


451 


EXERCISES 

1. Explain fully what is meant by 'Secular Trend*. Name the Important 
types of movement* influencing a time series. 

2. Differentiate between the series : 

(a) which exist* at a point of time, 

(b) which it spread over a period of lane. 

3. Writ* short nose* on : 

(I) Feriedtelty* 

m Moving Average, 

(3) Irregular fluctuations, and 

(4) line offlest-flt. 

4. What are the various methods of estimating seasonal variations In 
particular historical series ? 

5. Calculate five-yearly moving average of student! reading In a Com* 
metre College shown by* the following fignrn ; 

Year 

12 3 4 5 6 7 8 9 10 II 12 19 14 

No. of Students: 

328, 31?, 357, 392, 402, 405, 410, 427, 405 , 438, 445, 447, 480, 482 

(B e**., turn, ms] fs-U 

6. Calculate the five yearly moving average for the follow mg time series 
and plot it with the original figures cm the tame graph. Neat calculate 
•even yearly moving average and plot it on the same graph. Comment 
on the reversal effect. 

Year 1 2 3 4 5 6 7 8 9 10 

Annual fig. 110 104 96 103 109 120 115 110 114 J» 

Year 11 12 13 14 15 16 17 18 19 20 

Annual fig 190 127 122 118 130 140 135 130 127 135 

Year 21 22 23 24 25 26 27 28 29 90 

Annual fig. 146 142 138 135 145 155 13© 148 143 138 

(Af.C**»., B***r*i, (5.2] 

7. Explain how you will deal with a time series, and illustrate your 
remarks with tne help of the following •erie* of annual figures for the 
period 1901*30; 


Period 

1901-10 

208 

223 

223 

Annual Valor* 
222 239 242 

238 

252 

257 

23$ 

1911-20 

273 

270 

2 68 

288 

284 

m 

300 

303 

298 

313 

1921-30 

317 

309 

329 

333 

327 

344 

315 

343 

362 

360 


(I CJS U ip 9 ) {5*6) 

8, Assuming a ten-yearly cycle for the following series relating to Index 
Number of the retail price of wheat in India {I873«*» 100) give the trend 
values and represent graphically the short time fluctuations with ' the 
trend removed. 

Year 1906 1907 1908 1909 1910 1911 1912 1913 1914 


Annual Av 

155 

168 

226 

203 

170 

1.58 

170 

IV 

200 

Year 

1915 

1916 

1917 

1918 

1919 

1920 

1921 

1922 

1923 

Annual Av. 

22? 

193 

205 

270 

341 

310 

360 

315 

356 

Year 

1924 

1925 

1926 

1927 

1928 

1929 




Annual Av. 

246 

294 

281 

267 

264 

262 








(ftCwn., AtUksM, rag#) 

(141 

Explain the use of moving average in the analysis of time series. 

rind 

out an approximate movi 

imr average foe the following series 



Year 

1901 

1902 

im 

1904 

1905 

1906 

190? 

1908 

1909 

Figures 

506 

620 

urn 

673 

588 

6% 

1116 

730 

66$ 

Year 

1910 

1911 

1912 

1913 

1914 

19*5 

1916 

UI7 

1918 

figures 

777 

1189 

818 

74$ 

845 

1776 

m 


m 

Year 

1919 

1929 

1921 








Figures 1380 Ml 926 


(A14. f Csfrafl*, r?J*) |5.5] 



452 aw TWTWOnccnow to statistical methods 


10, Below are giv en figures of production (In 000’t md«.) of * sugar factory. 

Vc*r mi m2 ms im ms im mi 

Fioducifen in G09*t mb. 60 90 92 83 94 W 92 

(a) Find (he slope of* straight line trend to these figures. 

! n) Plot these figures on a graph and shot* the trend line. 

(c) Do these figures show a rising trend or a failing trend ? 

How do you arrive at your concluskm ? f 5.7] 

U. fit a straight tine (rend to the following figures of Index Numbers : 


Year 

1939 

1910 

1941 

1942 

1943 1944 

1945 

1946 

I . No 

100 


187 

221 8 

245-5 238-9 

233*6 

229-2 

Year 

1947 

mu 

1949 

1980 

1951 1952 

1953 


I No. 

191 6 

2599 

181-5 

24frfi 

310-5 170*6 

261-2 







tRaj , 

'95? 

f5.8j 

Fit « straight line trend bv the method of Jean squares 

using ihwt-cut 

method. 

Year 

1941) 

1941 

1942 

1943 

1944 1945 

1946 

1947 

Milk con* 
suxxiptaui 
m million 
gallon* 

ms 

101 9 

1058 

1120 

114 8 118? 

125-5 

102.9 








(59J 

Fit a quadratic parabolic trend to the following figures 



Year 


;<»24 

1927 

1950 

1931 1930 

1930 

1942 

Index of coal 
export perm it 

ia; 

142 

m 

124 156 

169 

270 





K 

B Com., Delhi, 

*955? 

(311) 


14. Assuming that (rend is absent determine the ftraionaUty in the follow* 
tag data : 

Veit Ik Quarter 2nd Quartet 3rd Quarter 4th Quarter 
1%V 3 7 fi :V3 3 5 

i m vs 3 6 sr> 

\m 41) 4 1 S',4 3 1 

1940 S 3 4 4 4*0 4*0 

15 The following i *.'>!« gives the values of exports of merchandise from 
India during the years 1919-20 to 1925*24. Calculate the seasonal 
variation for each month during this period. 


Mouth* 

1919*20 

1920*21 

1921*22 

1922*23 

1923*24 

April 

May 

20 

27 

17 

23 

29 

20 

26 

18 

26 

n 

June 

19 

21 

15 

16 

29 

Uy 

26 

19 

17 

23 

23 

August 

2$ 

19 

18 

24 

22 

September 

30 

21 

19 

20 

23 

October 

28 

19 

1) 

21 

23 

November 

29 

17 

19 

27 

26 

December 

26 

18 

21 

26 

SO 

Jam) ary 

29 

18 

n 

28 

SO 

February 

26 

17 

21 

SO 

S3 

March 

$0 

18 

26 

31 

40 


(».uj 



THE ANALYSIS OF TIMS' SE1UE!* 


4S3 


Calculate the seasonal indr* from the following d*u giving output of 
coal in million tom* by the U»k rrtaiive method nad ratio- "to moving 
•vcraft method 


i*i Quarter 
6*1 


2 nd Owarter 

62 


3rd Quarter 4?h Quarter 

Cl ®1 


2 

65 

58 

% 

61 

3 

68 

63 

(t3 

67 

4 

70 

59 

56 

62 

5 

60 

55 

51 

58 

j 5.14 

Using rhe data given below, explain clearly bow you 
seasonal tiucTuatimn* of a time series. 

would dr ten 

Year 

Summfi 

Monsoon 

Autumn 

Winter 

1 

30 

81 

(>.! 

119 

•t 

3 

3i 

104 


57? 

42 

IM 

99 

221 

4 

56 

172 

129 

235 

5 

o; 

201 

m 

w 


Find the trend eliminated figures for the data of question* 5 and 12 

[5 161 

Remove the effect of season from the data of question 16 15,17] 

Analyse the following figures of output of coal in Great Hr turn so a* 
to arrive at the extent of (a) seasonal movements and (bl irregular 
fluctuation*. 

{Output of coal in million ion* 

Year IM Quarter 2nd Quarter 3rd Quarter 4th Quarter 


Year 

Ut Quarter 

2nd Quarter 

3rd Quarter 

4th Qua< 

1927 

68 3 

626 

6H 

tm 

1928 

65'4 

57 9 

56*4 

61 5 

1929 

68 1 

62 7 

62 8 

67 0 

1930 

70'1 

5*1 

56*3 

36 0 

1931 

5*5 

54 8 

511 

580 


i M Com , Allahabad, tptf) 

2L Following figures give the yearly trend averages of sale* of a certain 
•hop. Calculate the monthly trend figures for all tlw year*. 

Year 12345 

Yearly average in R* 1200 1344 1468 1632 17 76 

What would be your figure* had the above figure* been the yearly 

totals? [V2IJ 

22. $tudy the short rtrnr fluctuation* of the following temperature roea- 
suret in degree* Fahrenheit : 

Date February, 1941 12 3 4 5 6 7 8 9 10 

Temperature 40 50 44 70 32 44 3b 40 56 68 

Dale February, 1941 \V 12 13 14 15 16 17 18 19 20 

Temperature 78 80 60 64 52 68 86 96 94 78 

{B.Cam., AUehaM, ryaa) f 5-27] 

*’• rind out seasonal variation* by the ratio to trend method from the 
data given below ; 

Year I Qiiarter II Quarter Til Quarter IV Quarter 
I960 30 40 36 3? 

1951 34 52 50 44 

1952 40 58 54 48 

1953 54 76 68 62 

1934 80 92 8$ 82 |5.23J 

34. Calculate the cyclical fluctuation* by the residual method hr the 

data of question 23. [5.24] 

35. Fit a second degree parabola to the following : 

X l 2 3 4 5 6 7 8 9 

r 2 6 7 8 10 H II 10 0 

J5J0J 



Chapter 17 
Index Numbers 


I n the preceding chapter we have discussed the various 
methods that are used to analyse the fluctuations in a statisti¬ 
cal series. Here it i* intended to consider a device, known ax 
Index Number, that is used to measure these fluctuations, 

If our data consist of the measurement of a single variable 
at different points of time or at different places it does not 
require any special technique to measure their variations* Thus, 
if we are given the following information with regard to prices 
of a commodity : 

Tm mo 1941 1942 1943 1944 

hut per md. fa 16 20 22 IB 24 

a little mental arithmetic will show that the price in the year 
1944 has gone up by Rs. 8 per tnaund as compared to 1940 
prices, or that the price in 1943 is two rupees more as compared 
to 1940. But if it ts desired to compare the variations in the 
price of one commodity with those in the price of another the 
absolute measure of the change will not be of much help. In 
the following table are given the prices of two commodities 
x and j. 

™^" lr fW 1940 \9i\ 1942 1943 1944 

Commodity x 16 20 22 18 24 

Commodity y 4 5 6 7 8 

If we compate the figures as they are given above we are 
led to believe that there is a greater rite in the price of com¬ 
modity x—its price having gone up by Rs. 8 as compared to a 
rise of only four rupees in the price of commodity /. But a 
little reflection will enable us to realise that though the rise in 
the price of x, measured in absolute terms, is more, the rate 
of the change in/ Is tv ke as great as it is in x. Such a com* 








index mmmB 455 


parison is dearly brought out if the course of prices is expressed 
as it& the table below : 


r,ar 1940 

1941 

1942 

1943 

1944 

Prut Rtlatims ^ 

125 

125 

137*5 
150 

112 3 

175 

150 

200 


In this table the price of each commodity for each successive 
year is expressed as percentage of the price in 1940, which is 
termed the base year. Thus the figure representing the price 

of commodity a in 1941 is - 125. Such figures are 

termed as price relatives or simply ‘relatives’, A relative may 

also be expressed [ ^ Jx 100 where l\ represents the price 

in the current year and P t represents the price in the base year. 

Index Number in its simplest form is understood to mean 
such price relatives. But the common practice is to use this 
tern for figures which represent the variations in a group of 
related scries. The series to be grouped may relate to produc¬ 
tion, prices, unemployment, cost of living, etc. Thu* if it is 
desired to compute an index number of industrial product ion 
for a particular year, we will group together the output of 
textile*, steel, chemicals, etc., for the year in question and 
compare it with the output of the base year. The device which 
enables us to combine the variations in different scries with a 
view to obtain a figure that faithfully represents the net result 
of the change in the constituent element# is called an Index 
Number. 

Tim Construction of Index Numbers* 


The various methods of constructing index numbers may be 
classified as under : 


* F<* purpose* ufwcpfamtim tto dtattsiem a confined ta index number 
. ef prices* .. . 




456 Am iHTmoDucncm to statistical methods 

1. Th» wmmgkted (simple) method : 

(a) The aggregate of actual price*, 

(b) The average of relatives (price relatives). 

2. TAt weighted method : 

(a) The weighted aggregate of actual prices, 

(b) The weighted average of relative prices. 

ttaspi* Aggregate of Actual Prices 

Under this method the sum of commodity prices in the 
current year is expressed as a percentage of the sum of the 
prices for the same commodities in the base year* 

* 100 

when 2TP t *» sum of prices of the commodities in the current 
year, 

£JV*» sum of prices of the commodities in the base year. 
TABLE 17.1 


Index Number based on Aggregate of Actual Prices 


Commodity Unit 

1941 

Rs 

1942 

Rs. 

1943 

R». 

A per maund 

100 

90 

110 

» .. .. 

10 

12 

9 

C M ft 

5 

S 

4 

O M t» 

4 

5 

2 

E M chhatak 

1 

3 

1 

Toul 

120 

115 

126 

Index No*. 

wish 1941 as base 

100 

95-83 

105 


In table 17,1 the prices of the various commodities for 
1941, 1942 and 1943 have been aggregated, and the totals for 
194t and 1943 thus obtained have been expressed as percen¬ 
tages of 1941 total, giving an index of 95*83 and 105 for 1942 
and 1943 respectively. 

The chief weakness of this type of index number is that 
those commodities which have large figure quotations dominate 
the index. For instance, a decrease of 10% in the price of 







INDEX NUMBERS 45 ? 

commodity A for 1942 is enough to bring down the index hi 
spite of the fact that there is an increase of 20%, 20%, 25% 
and 300% respectively in the prices of commodities B > C, D 
and E. 

For 1943 an increase of only 10% in A is more than enough 
to counterbalance a decrease of 10%, 20% and 50% respectively 
in the prices of B s C and ZX In other words, it can be said 
that the importance (weights) assigned to various commodities 
under this system is directly related to their price quotations, 
large figure quotations imply large weight* and small figure 
quotations small weights. 

Thus in table 17.1 the price quotation of commodity A for 
1**41 is five times as large as those of the remaining four com* 
modifies put together. This means that the total weight 
assigned to A is five times the weight assigned to the other four 
commodities. Now' this system oi weighting is illogical for the 
simple reason that it is unscientific and hence the index 
numbers thus obtained do not faithfully represent the variations 
of the group under consideration. This defect of unscientific 
weighting cannot be remedied by adopting uniform units, say 
mautids, for purposes of price quotations. Such a device will 
introduce new inequalities in place of old ones and will not in 
any way be helpful. It can thus be concluded that this system 
is quite unreliable as a measure of price change. 

TABLE 17.2 


Index Numbers based on Aggregate of Actual Prices 


Commodity Unit 

1931 

Ri 

~mr 

R*. 

~mr 

Rj. 

■w 

Rt. 


A per ntaund 

10 

9 

8 

9 

11 

B „ seer 

4 

5 

3 

6 

4 

C „ chhatak 

6 

8 

5 

4 

6 

D „ xnaund 

11 

13 

15 

14 

12 

E „ seer 

6 

8 

10 

14 

12 

Total 


43 

“""4f ~~ 

~ir~ 


Index Number _ _ 

with 193! as base 

100 

116-2 

no-8 

1270 

121 win 

Index Number . __ 

with 1935 a* bate 82 222 

95-5 

91*1 

104*4 

W 



458 aw ntimoDiiCrioK to atatiatical methods 

Hmt iinipb Anrtft of Rflidv«« 

Another method of computing an index of a group of 
quotation* ia to average the relative* of individual commodities. 
The relative of an individual commodity (a* explained earlier) 
»» the per cent ratio of the price (for the given period) to the 
cone* jkj ruling price for the bate f»eriod For the purpose of 
average—mean, median, geometric mean, or harmonic mean 
may be used, 

Tto StmpU Mtm of krtaiws Method, According to this 
method the arithmetic average of the price relative* to compu¬ 
ted to obtain the required index number. 

AigibrtdeaUjf: 

Mean of relative* 


~£v X 100+ -hr x 100+ -Or. 

....... *;v. 


X 100 + 


x 100 


where P % represent* the price of the current year, P* represent* 
the price of the bate year, M represent* the number of com- 
mod it ie* included in the index 

Exprenaed in the form of a formula * 

;V 


100 ] 


TABLE 17.3 

Index Number based on Mean of Relatives 
Data of Table 17.2 


Commodity 

1931 

1932 

1933 

1934 

1935 

A 

100 

90 

00 

90 

110 

B 

m 

125 

75 

150 

100 

c 

100 

133 

83 

h? 

too 

D 

100 

ns 

136 

127 

109 

E ■ 

100 

m 

167 

233 

200 

■ Total 

500 


541 

~66? 

619 

Meau of Relative* 

100 

119 8 

1082 

' 133*4',-'. 

123 -8 . 







459 


tjtDtx mumm 

The resulting composite index number shows that cm an 
average these five prices rose to 123 *8 in 1935 as compared to 
100 in 31. 

if we take 1935 at bate year and find out the relatives of 
the five commodities for the year T931, then the index number 
will be m % 

The relatives of C-+ B-tB 

~ 5 

raking 1931 as given year and 1935 as Base Year. 

90*9+ 10004-100 049 1-74-500 ^^^ 

5 

when the Base Year is 1935 

Simple average of relatives method is not influenced by 
units (maundx, seers) in which prices are quoted or by the 
largeness or smallness of a price quotation. The price relative 
of commodity A for 1933 with 1931 as base will remain the 
same whether we quote its price in inaunds or seers. 

Like the previous method, this method also involves a kind 
of illogical weighting The quantity used as weight is the 
amount of the commodity which could be bought in the base 
Year for Rs. 100. I llustrating from the price of commodity B, 
if its price in base year to Rs. 6 per seer then we could purchase 
about 16 seers 10 chhataks for Rs. 100 in 1931 (base year)* 
The same quantity will cost Rs. 200 in 1933 when the rate it 
Rs. 12 per seer, In other words, to purchase that quantity of 
commodity E which could be purchased in 1931 for Us. 100 it 
is necessary to pay Rs. 200 in 1935, The weight assigned to E 
in this case is 16*7 seers (approx.), to A 10 0 maund, to B 25*0 
seers, to C 16*7 chhataks, and to X> 9T maunds. Thai system 
of weighting is unscientific for the simple reason that weights 
have not been fixed in accordance with the relative importance 
of the commodities, 

M*4im pf Btkivm Jwfcx Nmmkm* When the data are 
characterised by marked differences between the items (t.e-» 
when the data contain very high or low items) the mean might 
not be so typical an average as the median. In our illustration 




400 Ait ifrreoDUcrto* to stati«ticai> methods 

the price of commodity E rose by 133% in the year 1934, while 
the price of commodity B rose by 50% in the tame year and 
the price of commodity C fell by 33%, 

Where the data are marked by such conditions, it is prefer¬ 
able to compote group index numbers on the basis of the 
median rather than the simple arithmetic mean. 

The relatives for the year 1935 arranged in an ascending 
order indicate that median is 109 (100, 100, 109, 110, 200}. 

The median of relatives for the various years are as : 


Year 

Median 



Relatives 

(arranged) 

193! 

t GO 

100 

100 

100 

10) 

100 

1932 

125 

90 

118 

125 

133 

133 

1933 

83 

75 

80 

83 

136 

167 

1934 

127 

67 

90 

127 

150 

233 

1935 

109 

100 

100 

109 

110 

200 


If we fix 1935 as base year the index for 1931 based on 1935 
will be according to this method 917, as the relatives computed 
are 50*0, 90*9, 917, 100, 100. 

Gttmttnc Mean $f ktUtinj Indtx Numbtr From the practical 
point of view, geometric mean of relatives is not very attractive. 
It tt not easy either to calculate or even to understand the pro¬ 
cedure of its computation. The main argument in its favour is 
that it weights equal ratio differences alike—just as simple 
mean weights equal arithmetic differences alike. This is illus¬ 
trated by the following example : ' 


The relative prices of two products are : 


Prodoct 

Priod I (base year) 

Period 11 (current year) 

X 

100 

200 

y 

too 

50 


... ■ . iCW-f ju 

Arithmetic mean —^125 
Geometric mean %/ 200x50^100 







Index Numbei based on the (ieomctiic Mean *»f Relative* 


INDEX IfUKBW 


1 I! 


£ 

p © 9 

N W W W W 


2 i£8 8S8 

IX i —. — — — CN 


l 


* 


] CM 

1 

j s 

s 

r-* 

i 

a? 

<0 

0 

s 

a 

j S 

? CM 

1 r 

•O 

»o 

s 


<N 


cm 

CM 

i 0 

1 1 

CN 

; 3 

a 

x- 

r*» 

CM 

233 

' i 

! j 

i I 


co 

It; 

I-n, 

-30 

5 » 

V '• 

4.0 

eO 

CO 

r-v 

CM 

CM 

CM 

i *n 
; tn 
i £ 

0 

O 

•Ht 

. —. 


— 

cm 

CM 

: 9 

<N 


<0 i_... 

on. r 


c 

a* 


at' 

9 


s? 

$ 


— 

*0 

CO 

X 

C-* 



OS 

r* 

00 

40 

X ; 










CM 

Oi 

<*> 

STj 

i7» ■ 

8 

CM 

Mr 


CO 

i—* 

*0 ;. 

M* 

»o 

m 

CM 

1 "» 

CM < 

I'- 

f"«- 

ce 

0 

>— 

O 


CO 

O 


CM 

CM 

Cm 

(M 


CM 

3 

*0 

eo 

40 

1 

CO ’ 

; | 
1 


CM 

CO 


CO : 





“* 


j 



9 

O 

9 

O 

9 I 

9 1 

9 

CM 

Ol 

Cm 

CM 

CM i 

: O j 

CM 



< flQ O Q U3 


i 


461 


1931 = 100 100-0 118-7 102-5 121 7 I19-J 




462 Alt IKTBODOCTIOK TO STATISTICAL lfTTHODS 

The arithmetic difference* from the mean arc the tame ; 
(220*** 125)«*75 and (125 — 50)^73. The geometric or ratio 
difference* from the geometric average art the tame 200-r 100 
*™2 f and 1004 50«« 2 

The geometric mean is a root extracted from the product 
of a varieiy of item* and is easily and quickly calculated by the 
use of logarithms. Reverting to our old illustration (Table 17.2) 
the geometric mean of relatives is calculated is in Table 17.4. 

Thus the geometric mean of relatives is the index whose 
logarithm is the simple average of the logarithms of individual 
price relatives. By this method (as shown in tabic 17,4) index 
of prices of five commodities is 119* 1 in 1035 with 1931 as 
base. 

If the year 1935 is taken as base year the index number for 
1931 with the help of this technique would be : 

Relatives Logarithms of Relatives 


A 90*9 l 9586 

B 100*0 20 

€ 100*0 2 0 

D 91*7 1*9624 

E 50*0 1*6990 

Total 9*6200 

Average l 9240 


Index No. 1935- 100 63*95 

___(Antilog of 1*9240) 

Algebraically the formula may be stated as : 

O.M. of Relatives-^ / y A y J?l_ v P** 

v >• p: pr . ps 

or 

Antilog 

»<*<*( Jy ).^of( Jv ) . + »o fr of(J>; ) 

\ Ji -““---~i 









as : 


tntmx mmmm 463 


The Hep* to be followed in this method may be summarised 


(1) Select abase year, 

(2) Translate the prices of each commodity for each year 
into series of relatives, 

(3) Compute the logarithms of the relatives, 

(4) Find the sum of the logarithms for each year, 

(5) Find the arithmetic mean of the logarithms for each 
year, 

(6) Finally calculate antilogarithmt of the figures obtained 
after the fifth step. 

Harmonic Mean of Ritalims Index JVitmbtr. Instead of com* 
puting the average of price relatives with the help of mean, 
median or geometric mean, we may employ harmonic mean 
under certain circumstances. Like the previous one, this 
method also involves some difficulty in calculations* Under 
this method we have to compute reciprocals of relatives, total 
them and find out the simple mean of the reciprocals of rela¬ 
tives. The index number will be the reciprocal of the mean of 
the reciprocals of relatives. 


For the price relatives of the form of* 


P\ 

P\ 


the reciprocal 


wiH be 

P 1 

The formula for the harmonic mean will be : 


/ 

H 


LL 4 . _A f jy: 

p . p .. p r 

.A . 




464 aw iWTWODrcnort to statistical methods 

Taking our previous illustration the method rf computation 
i* ll fotkiwi: 


TABLE 17.5 

Index Number based on the H.M- of Relatives 
Indices of prices of 5 Commodities (See Table 17-2) 



A 100 0100 90 01!! 80 0125 90 -0111 110 -0091 

B 100 -0100 125 -0080 75 *0133 150 0007 100 0100 

C 100 *0100 133 *0075 83 0120 67 *0149 100 *0100 

D 100 *0100 118 0085 136 0074 127 -0079 109 0092 

E 100 *0100 133 -0075 167 *0060 233 *0053 200 *0050 

Total *0500 .0426 *0512 *0449 0433 

Average 0100 0085 *0102 0090 *0087 

Index No. 

1931 100 117*6 98 111*1 114*9 

«too 


If the year 1935 is taken as base year and the year 1931 as 
current year then the Index Number will be as . 



Relatives 

Reciprocals 

A 

90-9 

0H0 

B 

100*0 

-0100 

C 

100*0 

•0100 

I) 

91-7 

•0109 

E 

50*0 

•0200 

Total 


•0619 

Average 


•0124 

Index No. 1935- 

100 

806 








INDEX NUMBERS 


465 

CMttftfim «fl»4fs Nw nb er t 

Wc have computed index numbers of a group of five com* 
modi ties by different techniques and have obtained a different 
figure by each one of the five methods- The result* obtained 
range from 123 8 ro 109* as shown in the table below : 

Method used 


1. Simple Aggregate 

2. Mean of Relatives 

3. Median of Relatives 

4. Geometric Mean 

5. Harmonic Mean 

Each of these figures purports to indicate the general per* 
rentage which 1935 prices are of 1931 prices. It has already 
been pointed out that none of these result is perfect because of 
the unscientific weighting that is inherent in each of these 
methods. But, ignoring for the time being the question of 
weights, it would be appropriate at this stage to judge the 
adequacy of the different method* of measuring change* in the 
price* that have been explained so far. 

The Time Reversal Test 

For this purpose Irving Fisher has suggested the use of 
'Time Reversal Test,' This test emabits us to determine if a 
method will work both ways in time ‘backward and forward* 
This means that the index number for (say) 1935 with (say) 
1931 as the bate should be the reciprocal of the index number 
for 1931 with 1935 as the base, i.e , their product should be 
unity. Thus if the price of a commodity ha* increased to two 
rupees per seer in 1954 as compared tonne rupee per seer in 
1953, wc would say that the 1954 price is 200 per cent of the 
1953 price and the 1953 price is 50 per cent of the 1954 price. 
Now these two figures are reciprocals of one another and their 
product (2*00 x *50) r* equal to unity. If the method does not 
30 


Index Number for 1935 
with 1931 as base 

121-fi 
123*8 
109-0 
119 I 
115 0 






466 aw inrnoDucTion tq statistical methods 


work both ways, i t., if the index number* for two years secured 
by the some method but with basis reversed are not reciprocals 
of each other there it an inherent bias in the method. 
Algebraically the test may be expressed as 

P *'» * P i** 1 ^ I 

where l stands for index for the current year on the base 
year omitting the factor 100, (be., for price change in current 
ytar as compared with base year) and /*,.* stands for index for 
the base year on the current year without the factor 100 (i e., 
for price changes in base year compared with current year), 

n Index No for the current year on the bate year 

raft ..- --- — »■-. ..* ..‘ . 

* 1 10U 

_ Index No. for the ba*r year on the current year 

p " m ..'...loo-- 

Combining the result of the table illustrated above wr get 
the following result : 

When the base year is 1931 and current year 1933 
Aggrega- Average 

Ease tive Mean of G. M of H M. of Mrdian of 

*»193I Method Relatives Relatives Relatives Relatives 

1931 TOO* 100 IOC) 100 100 

1935 121 6216 t23 8 119*1 114*9 109 


so P q , ** 
or fVi*“ 
or P|., w. 

or 

or ^ 


121*6 

100 


when Aggregative Method is used 


123'8 

when Mean of Relatives Method is used 


119*1 

100 


when G. M, of Relatives Method is used 


-- when H.M. of Relatives Method is used 
I Ut) 


109 


100 


* when Median of Relatives Method is used 








THDBX fUHBERS 467 

When 1935 is taken as base and 1931 at current year then : 


Aggrega- Average 

Base live Mean of G.M. of H.M. of Median of 
*»19$5 Method Relatives Relatives Relaive* Relatives 

1935 100 100 IOC 100 100 

1931 82*222 8695 83*95 80*6 91*7 


so fi q*-r^rwhcn Aggregative Method is used 


or /Yo " — w * irn ^fcan Relatives Method is used 


83 95 

or fYe uQ\ when G.M. of Relatives Method is used 
1 uu 


or P x • ’ ■ when H.M. of Relatives Method is used 


or rivT w ben Median of Relative* Method is used 

I l)U 


r> i x 


Aggregative Mean of G. M. of H. M. of Median of 

Methods Relatives Relatives Relatives Relatives 

j ; V 

121-6 82*22123-8 865 1191 83*95 114*9 .80*6 109 91*7 
Too x Too; ioo * too loo * 100 too y too Too* ToO" 

* j 


*9997952 } 107 -999W45 { *926 j *99953 

i t i I 





468 AN INTRODUCTION TO flTATfflTICAI* METHODS 


Note* If we iite exact values of F rl and P,, 0 (without con* 
verting them to decimals), Aggregate Method and G. M. of 
relatives will give the product P^ x and /Vo as exact 1. In the 
present cate ; 

By aggregate method : /Vi" y^-and P vn ^ 


* /> v p t? / ^ w 1 

•• '** 37 45 

By G.M. of Relative* : Relatives corresponding to /V» are : 

for Commodity A \ ~ , B** j ♦ C ' f > D**\y and E - ^ . 

10 4 b M n 

The similar values corresponding to /V® are : for Commodity 

l fl 4 ti t 11 , t .6 

II* 4 ’ 6 * 12 12 


iiy.l 

11 v JO 4 


4.6 12 12 4 


:< --- x |T x g } 


„ / 10 4 6 II 6 4 

* * l II I 6 12 12 y 




6 12 
6 11 




Allowing for the adjustment of decimals, the aggregate 
method, the geometric mean of relatives method and to some 
extent the median of relatives method stand this lest of 
efficiency and can Ire said to display for what they were 
computed. Mean of relatives and harmonic mean of relatives 
have respectively an upward and downward bias. 



mm NUMBERS 


469 


According to Fishers, tf The test is that the formula for calcu¬ 
lating an index number should lie such that it will give the 
same ratio between one point of comparison and the other, no 
matter which of the two it taken as base** or “putting it another 
way, the index number reckoned forward should be the reci¬ 
procal of that reckoned backward.** 

Weighting of Index Numbers 

It has been discussed in the preceding pages that the two 
index numbers.‘the simple aggregative* and ‘the simple mean 
of relatives*, which apparently seem to he unweighted arc not 
really so. The simple aggregative method is weighted by the 
magnitude of the prices (the heigher priced commodity has a 
greater influence on the result than a lower priced commodity). 
It is also influenced by units in which prices are quoted. Com¬ 
modities whose prices arc quoted in imumds exert greater 
influence than those whose prices are quoted in smaller units. 
Simple mean of relatives method gives equal weights to all 
commodities. Thus a less important commodity has as much 
influence on the result as a very important commodity has. In 
order to eliminate this unscientific weighting it is necessary that 
scientific system of weighting l>c introduced which will accord 
to each commodity price the importance it should have in the 
light of the object in view. 

That a system of weighting is necessary to appreciate fully 
the impact of the change can be illustrated by the behaviour 
of prices of two commodities A’ and 2, If the price of X rose 
by 10% and that of X fell by 10%, the mean of price relative 
would show no change—and yet the consumers might complain 
of a rise in the cost of living if they were spending more ott A' 
than on X. For them the increased price of AT (which is a 
necessity) is not offsect by the lowered price of X (which is a 
comfort or a luxury). This difficulty can be easily met if the 
various commodities are weighted accoiding to their relative 
importance. The weighting may be effected by either of the 
two following methods : 



470 AW IWTW00UCT10W TO STATISTICAL METHODS 


(a) each price relative may be weighted by a figure repre¬ 
senting the comparative importance of the commodity ; 

(b) for ceratiofi types of commodities several price 
quotations may be included while for other commodities 
there is only one quotation that is considered. 

If we apply the first method of weighting to the example 
given above and assume that the weights given to each com¬ 
modity are 2, 6, 3, 7 arid 2 respectively, then the following 
index numbers will be obtained for 1935 figures : 


Commodity 

Price Relative* 

1931 = 100 

Weights 

A 

110 

2 

B 

100 

6 

C 

100 

3 

D 

109 

7 

E 

200 

2 

The index number by the arithmetic mean : 


(110x2) + (100x6) 

+ (100x3) f (109x7; 
20 

) f (200 > 2) 

2,283 ...... 



The index number by the geometric mean : 


Tio* ! 100*4 

• 100* + 109* + 200* 


'2 000+ 6 000+14 2618 + 4 6020 
*• Antilog -j 0 — — 

. 40-9466 

wAntilog.-gp- ~ 

111-5 






INDEX NUMBERS 


471 


The second method of weighting can be illustrated by the 
following example : 


Commodity 

Price Relative* 

193 U 100 

Cereals 


A 

120 

B 

140 

C 

110 

D 

200 

Industrial raw materials 


R 

130 

S 

no 

Manufactured articles 

140 


The InrictfbvA.M * 


120 r 1404 110 4 200 I 1304 110 f 140 
7 " 


930 

7 


« 135*714 


The Index by G.M. - Antilog 

2 0792 42 1461 42 04J442 3010 j 2 11394 2 0414 12 1461 

7 


, t 14 6691 . >. _ n 

- Antilog —y--Antilog 2T242 

- 133 

It may be stated that a single set of weights cannot be 
employed for all purposes for which price indices may be 
computed. If the computation of price index is desired to 
indicate the variation in the factory workers* cost of hying 
then the weights assigned to the various commodities will be 
in proportion to their respective importance in the workers 1 
budget. If, on the other hand, the executives of a manu- 





472 AN INTRODUCTION TO STATISTICAL METHODS 

facturing establishment desire to compute a price index of 
the commodities they manufacture, the weighu are fixed 
by relative proportion of their various products in their total 
turnover. 

The weights based on quantities (produced, marketed or 
consumed) may be further subdivided into : 

1, Physical volume weights and 

2. Value weights. 

In the former type of weighting, me weights, i.c, the 
quantities are multiplied by the prices in each year. Total 
of the weighted prices of given year is reduced to relative of 
the base year total of weighted prices. 

In the later type of weighting, physical volume weights 
are translated into value weights. “The weight assigned to 
a commodity is the proportion which the base year value 
of that commodity bears to the combined base year value 
of all commodities.The calculations arc shown in the table 
below : 


TABLE 17.6 


Showing the Determination of Value Weights 



Base Year 


Proportion 
of total 


Quantity 
produced 
in 1931 

(909 1 * omitted) 

.... .. 1* .. 

Price in R*. 
Po 

« t base year 

■ . value 

Value 

<r* x Pt i*£s. x ioo 


1 

2 

3 

4 

A 

3,000 rods. 

10 per ind. 

30,000 

10 

B 

13,000 seers. 

4 „ seer 

60,000 

20 

C 

2,500 chhataks 

6 „ chhatak 15,000 

5 

D 

8,455 rods. 

11 „ rod. 

93,005 

31 

E 

17,000 seers 

6 . i seer 

1,02,000 

34 



INDEX NUMBERS 473 

The bale year quantity (f t ) is multiplied by the base year 
price (£,),'both quoted in the same unit to secure the value of 
each commodity produced (ptf*) in the base year; These 
values are totalled, £ to seettre the total value of all 

commodities in the base year, The value of each commodity 
is then divided by this total [f*^ t ;and multiplied by 
100. Column (iv) of the above table shows the proportion 
which the value of each commodity bears to the total value of 
all commodities in the base year. These are the value weights. 

Physical volume weights are employed in the aggregative 
method, whereas value weights are employed to the average of 
relatives method. In the above illustration the value weights 
have been obtained by multiplying the base year price with the 
base year quantity, i,e. f p#q 0 These weights may also be 
obtained by any of the following methods : 

Base year price X given year quantity {Pqjjil 

.. 

Given year price* given year qu antity (foft) 

"(Mi) 

Given year pric e> base yea r q uantity (prf j 

EiPiM 

The Weighted Aggregate of Price Index 

According to this method the price of each commodity for 
each year is multiplied by its weight (the quantity in the base 
year). The figures so obtained are totalled for each year 
separately. The ratio of the total for the given year to the 
total value in the base year is the weighted index number. It 
should Ire remembered that weights (used here) arc physical 
quantities. The weight for commodity A, 30,00,000 maundi, 
is multiplied by the 1931 price, Rs. 10 per maund, to secure 
the value of A for that year; thus the value of each commodity 
for each year is ascertained. Table 17.7 will reveal dearly the 
process of computation. 



TABLE 17.7 


474 API INTRODUCTION TO STATISTICAL METHODS 


1 paimoo ! © 

Kooo) 9o)*\ | 8 


I 1 5 ! 


*H I — + O 04 £J 

j *>«d j ~ 

I (pMIJtUIQ j § § § Jo § 

Kooo) *"i*a I S ^ 2 2 8 

\ _r <>r 


‘X j *, 


O 


j (pajjiuio j § 
KOOO) **n|i?A I V 

i 


to 

© kO 

CM 

Q cm 

CC 

© co 

o’ 

©■ CC 

Cm 

r-> 


w ^ »c o 


(pan|UJo ) <5 
;*.000) »«'i«A j K 


^ ^ ^ ^ 
«o o of tg 

h N O 


•*>i } 
«M t | 


5^ 1C QD CC 


(unuMo 


Q © 10 <n 
to 25 — Os 


•O w I m 

3 § I S 


5- 8 

S I “ 


ft. ~_ 

XnpOUIUlOQ 


^ kn W « ^ 


< * O Q M 




INDEX NUMBER# 


475 


This method may be stated as : 


Weighted aggregative 
index number 


_ Pi fp Pi*9p* 4 Pi M V 
h'U' +P*U m +p9 '"ft" 


4 - 


4 - ,.. 


or 


~ tMo! 


If the quantities of the current year (g|) arc used as weight* 
the formula would be : 

£<mJ 

l \pi$ jj 


The steps may be summarised as : 

1 Select a base year. 

2 Decide the weight for each commodity, taking into 
consideration the object of index numbers. 

3. Multiply each of commodity prices by the weight 
paired with it. 

4. Total the value* so obtained for each year 

5. Divide each total by the base year total and multiply 
the quotient by 100 to translate the index to percentage 
form. 

It should be noted, while computing index number with 
this technique, that the prices and weights for a single com* 
modify must be quoted in the same units. This technique t* 
quite satisfactory because the units in which prices are quoted 
do not alter the true state of affairs. 


The Weighted Menu of Relative* Price Index 

This method is also merely an extension of the simple mean 
of relatives method. As previously stated this index make* 
use of value weights instead of physical volume weight*. 
Table 17 8 will indicate (he process of computation of this 
index number. 

The price relative of each commodity for each year U 
computed in the manner already explained. 



TABLE 17 8 

Index Number based on VVaghttl Mean of Relame* 


476 AN INTHOm CTlOT* TO KTATI8T1CAE MVMOM 





INDEX NUMBERS 


477 

Each relative is multiplied by the value of the weight 
which is paired with it. The weighted relatives are then 
summed* The total of weighted relatives for each year is 
then divided by the total of value weights The resultant 
figure will give the index. The steps in the computation of 
index number through this process are summarised hereunder ; 

(!) Having ascertained the number of commodities and 
the base year, it is necessary to secure the prices of 
commodities for each year and their production or con¬ 
sumption estimates for the year selected as base* 

(2: In order to secure value weights : 

(a) Multiply the base year price by the base year 
quantity of each item and total to obtain the 
value of all items combined, 

{hThen the value of each item in the base year mutt 
be divided by the total value of all item* in the 
hate year. 

( c) The results obtained by step (b) for each com¬ 
modity then should be multiplied by 100. Tbit 
step provides the percentage with the value of 
each commodity hears to the total value. 

(%) Translate the series ««f prices into price relative* by 
dividing the price of each commodity by it* base year 
price and multiplying the quotient by 100. 

(4) The price relatives obtained by third step, should 
then be multiplied by the value weights, secured by the 
second step. 

(5) The result* obtained by the fourth step then should 
be totalled. 

(f) The totals secured by the fifth step then should be 
divided by the sum of the weights. In the illustration 
the turn of the weights taken is 100. 

Weighted Geometric Mean of Relative* 

The same system of value weights may be used in com¬ 
peting weighted geometric mean of relatives. In this case 



478 AW INTRODUCTION TO STATISTICAL METHODS 


the logarithm* of the relatives are multiplied by the weights. 
Then the products of the value weights and logarithms of 
relatives are added. Next the total for each year is divided 
by the total of the value weight*- The quotients will give the 
logarithms of the index. If we find out the antilogarithms, 
we will get the indices of price based on weighted geometric 
mean of relatives, 

lias in Weighted Index Numbers 

We have seen earlier that in the so-called 'unweighted* 
index numbers the technique of construction will introduce a 
bias in the results. When weighting system is employed in the 
computation of index, another kind of bias, called ‘weight 
bias’, ap{>cars. In order to distinguish the two types of bias, 
the one present in the apparently ‘unweighted’ index is known 
as ‘type bias*. Thus u type bias is the outcome of implicit 
weighting, whereas the ‘weight bias’ is the result of explicit 
weighting. 

When we use given year value a* weight the Idas w ill be 
upward. It wdll be downward when weights are base year 
values 

Factor Reversal Teat 

We have already discussed one test, namely, ‘time reversal 
test’, in which ‘the index number reckoned forward should be 
the reciprocal of that reckoned backward’. This test is not 
met by any of the weighted types discussed before, Irving 
Fisher lias, however, suggested one more test, vrr., the ‘factor 
reversal test* to be applied to weighted index numbers. 
Concerning this test he wrote : 

“Just m our formula should permit the interchange of 
the two times without giving inconsistent results, so it ought 
to permit ini ere hanging (he prut s and qmntilia without giving 
inconsistent result, i e., the two results multiplied together 
should give the true ratio/* 



IJtDJEX NUMBER* 


479 


In simple words the test is satisfied if the product of the 
price index and the quantity index is equal to the ratio of the 
value (quantity x price) in the current year to the 
aggregate value in the base year, 

Algebraically : 

r*iy<Ux~%2- 

~Po% 

I\ i standing for the price change for the current year over 
the base year. 

Cstanding for the quantify change for the current year 
over the base year. 

£p t qi standing for the total value in the current year. 

ZPiflt standing for the total value in the base year. 

Neither the simple nor the weighted form of any index 
number—arithmetic, geometric, harmonic—satisfies this test, 
Nine of the index numbers dealt with in the preceding pages 
satisfy this factor reversal test An index number which can 
satisfy this test is suggested by Irving Fisher himself. 

The Idem! Index 

Prof Irving Fisher has formulated a technique of index 
number construction which is free from weight bias as well as 
type bias. His formula employs “two indices, both of which 
err, but err in op posit directions, so that when a geometric 
mean is taken of the two, the errors arc found to compensate 
in such a way at to permit the index to meet both the ‘time 
reversal and factor reversal testsV*’ The formula is : 

The Ideal Price Index W x Mill. 

v 2p*9t ‘Mi 

Similarly, the Ideal Quantity Index^ x ~y^~ 

the subscript (,) refers to the current year, 

the subscript ( § ) refers to the base year, 

the letter (p) refers to the price of a particular commodity, 

and 

the letter (?) refers to the quantity of that commodity, 




480 AW INTRODUCTION TO STATT&TICAI, METHODS 


Therefore, 

2>tf t : current ye,ar price X base year quantity 
Zp v q x : current year price X current year quantity 
I'prfo : base year price X base year quantity 
Zp n q J : base year price X current year quantity 
The main met it of ibis index is that it satisfies the factor 
revertal test as well as time reversal test. This can be illus¬ 
trated by the example below : 


TABLE 17.9 

—jm——m - 

Base Year ;Currrnt Year 


e 

e 


4 * 

in 

fi 

<l\ 

M* 

pi% 

Pdh 

Mi 

6 


millions! 







A 

10 

3 

11 

3 

30 

33 

30 

33 

n 

4 

15 

4 

12 

60 

60 

48 

48 

0 

0 

3 

6 

4 

IS 

18 

24 

24 

D 

11 

8 

12 

7 

88 

% 

77 

84 

E 

6 

17 

12 

12 

102 

204 

72 

144 


r M , Zp t q t 

V'T V. - K-'' 

298 411 251 353 


Factor Reversal Test is satisfied if: 

where /y t stands for the price change for the current year over 
the base year, and Q 9 tv for the change for the current year 
over the base year. 

Now according to Fisher** ideal index number formula : 









INDEX NUMBERS 


401 


Hence, 




Substituting the values from table 17.9, we get: 

P xn _^/4TT~"sW x 2Ti x sla 
/... x il.-, - V 29 8 x 251 55T Ui 

mam 

“>296X2# 


va j33 

Now if also equal to —* 0 . 

298 

Thus it is proved that Fisher’s ideal formula for index 
number satisfies the factor reversal test. 

That Fisher's ideal index satisfies the ’Time Reversal Test* 
can also be seen from the following illustration : 

Price index for the current year (1935) with 1931 as base 
in the preceding illustration is ; 

l> ,, Jlhh. x SSl 

v *-Pt9, ZPuti 

Price index for 1931 with 1933 as base year : 

Time Reversal Test is : P rl X P v t** 1. 

am v^l **• 1 - 

Thus we see that the indices prepared according to Fisher's 
ideal formula satisfy the Time Reversal Test also. 

Fnblems In Index N amber Cmutmctioa 

The construction of index number involves the consider** 
t ion of the following important problems : 

51 




4*3 AH IlfTBODtlCTIOIf TO iTATfgTlCAl# METHODS 

1. 7 A# Purpose of Comtmtim. Before taking op the work 
of constructing on index number in hand the object for which 
it b to be computed mutt be defined as precisely at possible. 
The nature of the object will be the determining factor as 
regards the raw material which is to be used for obtaining the 
desired index. If we are not clear about the purpose, the data 
used may be unsuitable and the indices obtained may be mis¬ 
leading. Thus if it is intended to measure the rise in the cost 
of living during a certain period we must be definite as to 
whose cost of living we are considering—whether it is the 
living cost of the middle class people, agriculturists, artisans or 
industrial workers. Such definiteness is necessary for the 
importance of various items consumed by the different catego¬ 
ries of people may be very much different. A rise in the price 
of hixtf ry articles will not affect the cost of living of the poor 
people but it wit) certainly have an effect on the budget of the 
richer section of the community. The object of construction 
will also have a determining influence on the number and 
type of commodities chosen, the selection of base period, and 
the system of weighting that is to be adopted. 

2. Selection of the A umber end Types of Commodities, So far 
as the question of the number of commodities is concerned no 
hard and fast rules can be prescribed. But it can be stated in 
this connection that the number of commodities should be 
enough to fiermit the influence of the inertia of large number*. 
(The larger the me of the simple the greater is the possibility 
of its being representative of the whole population.) But the 
number of commodities included must be to large as to 
make the work of computation uneconomical ami even difficult. 
The numlier of commodities should, therefore, be reasonable. 
The reasonableness of the number depends primarily upon the 
purpose for which the index number is to be constructed. If 
it is going to be a sensitive index a smaller number of com* 
modifies would be reasonable. Fxonomic Adviser's sensitive 
index includes only 21 commodities. If, on the other hand, it 
is a general purpose index the number should be larger. Thus 



index nvmm» 483 

Economic Adviser's revised index of wholesale prices is based 
on 215 commodities. 

In the matter of selection of commodities attention should 
be directed to the following points : 

(1) Cost of obtaining the quotation , 

(2) Selected commodities must be fairly representative of 
the phenomenon under investigation, 

(3) The commodities should be such as remain uniform in 
quality from year to year, and 

(4) The commodities must be selected with reference to 
their relative importance for the object in view. 

There may be several varieties of selected commodity. We 
should include the most popular variety. More than one 
variety may also be included if it is desired to give greater 
wctghtage to a particular commodity. 

3. Prkt QvoUxtums. The problem of collecting suitable 
price quotations for the commodities selected is somewhat more 
difficult. Since it is neither possible nor necessary to collect 
the price of a commodity from all the markets in the country 
where it us bought and sold we should take a sample of the 
markets. In selecting a sample care should be taken to see 
that the markets included are such a* are well known for 
trading in that particular commodity. Once we have decided 
about the markets from where the quotations are to be collected, 
the next thing is to select a suitable price reporting agency. 
There may be a number of agenc ies that may be reporting price 
quotations, via. , business houses, Chambers of Commerce, news 
correspondents, etc. Our endeavour should be to select an 
agency' which may be most reliable. To check the accuracy of 
price quotations supplied by an agency it will be advisable to 
obtain such quotations from more than one reporting agency. 
Price quotations should always be for the same quality of the 
commodity, for a change in quality may mean a considerable 
difference in price. 

In order to facilitate the construct bn of index number 
prices should always be quoted as so much money per unit of 
wwttodity, e.g., Rs. 20 per maund or V annai per seer, and 



484 an ihthodiiCT ioft to statistical methods 


wot »i so many unit* of a commodity per unit of money (e.g., 2 
seen f**r rupee). Another important point to be decided ii 
whether the prices 10 be used in the construction of index 
numbers should be wholesale or retail. Wholesale prices should 
be preferred to the retail ones because they fluctuate less and 
are more sensitive to conditions of demand and supply as com- 
pares! to retail prices. 

4. The Select ion of Hate. Since the base period nerves as a 
reference period and the prices for a given year are expressed 
as percentages of those for the base year it is necessary that 

(i) the base period should be normal, and 

(ii) it should not be too far in the past. 

The base year should be normal because if it is not so (i.e. t 
if it is influenced by some unusual factors), all the other indices 
that are related to this year will be distorted as a result of the 
abnormal condition then prevailing. It is not easy to select 
a year whic h may strictly be called as normal. If a year is 
normal in one resprci it may very possibly be abnormal in 
some other respect. In order to overcome the difficulty of this 
type an ‘average of a number of v ears’ is generally taken as the 
base. This average is more representative and is less affected 
by chance variations. 

Fixed mud Chain Bane Indices 

The base may be fixed or changing. It is said to be fixed 
when the indices for different periods are computed on the 
basis of the prices of the bate year. Thus if the indices 
lor m2, 1933, 1934, 1933 are calculated with 1931 as the 
bate year such indices will ta called fixed base indices. In the 
illustration given on preceding pages indices have been calcu¬ 
lated on the fixed base method. 

If however, the whole series of index numlsets it not 
related to any one base period, but the indices for different 
years are derived by relating each year's value to that of the 
immediately preceding year the indices so obtained are called 
link relative index number*. Frequently these link relatives 



INDEX NUMBERS 


m 


arc chained together to a common bate. Such indices arc 
known as chain indices. By applying this method to the scries 
of quotation given in table 17.2 the index numbers will be as 
given in table 17.10. 

The Method of Computation 


According to the chain base method the price relatives for 
any year are computed on the basis of the prices for the year 
just preceding it. Thus the relatives for 1932 are computed 
with 1931 prices m the base, relatives for 1933 with 1932 
prices as the base and so on. When the price relatives are 
computed according to this method rhrv are called ’link rela¬ 
tives'. These link relatives arc given bom line I to 5 in table 
17.10.. In line 6 are given their totals and in line 7 are given 
the average* of these link relatives fur each year which aie 
obtained by dividing the total given in line 6 by 5 (the number 
of commodities}. If now it h desired to relate them all to a 
common base (say, 1931 in our illustration 1 these averages may 
be placed in a chain. The chain relatives, so obtained, will lie 
the indices on the chain base method in respect to the year 
1931. The method of'chaining together the link relative is as 
follows : 

The average link relative for 1932 with 1931 as bast is 12b. 
I his figure will remain the same for the simple reason that 
this is already related to 193). 

The average link relative for 1933 with 1932 as base is 90, 
This means (hat if the 1932 prices are represented by 100 the 
1933 prices are represented by 90. II the 1932 prices arc 
represented by 120 the figure for 1933 prices will Ik 


120 

Too 


x9o urn 


The chain relative for 1934 


um 

m 


125-133 


135 


100 


x 102*- 137-7 


The chain relative for 1935 * 



486 An imuoDwrmm to statistical method* 







498 ah iirxmomrcTio* to statistical methods 

Miito and Omtriti «C th* Chain B«m Method 

The main advantage* of the chain method arc two : 

(1) Under fhta method the index for the current year it 
related to the year immediately preceding it. This enable! ui 
to know the extent of the change that ha* come in the current 
year ai compared to the previous year. Thii is certainly more 
useful to business than a fixed index which it related to a year 
of the distant past. 

(2) Under thi* method it is possible to intioducr jnew 
hems or drop out old ones without having to recalculate the 
whole series. This is because of the fact that the index of any 
one year is related only to the year just preceding it and the 
changes occurring in neighbouring period* are never so great as 
to impair comparability Thu* if the list of commodities needs 
frequent change the chain base method is preferable to the 
method of fixed base. 

This method, however, involve* lengthy calculations and if 
an error is committed it tend* to be perpetuated in chaining 
process. 

The (Met of ss Average 

Since an index number is a technique of ‘averaging* all the 
changes in a group of series over a period of time the main 
problem is to select an average which may be able to sum¬ 
marise the changes in the component series adequately, In 
the chapter on ‘Central Tendency* we have discussed in 
consider able detail the characteristics jjf the different types of 

averages. We have pointed out___ this chapter the 

relative merits and demerit* oh C) . ..rods of averages 

in the comtiect;n of index numbers. Geometric mean is 
superior to arithmetic mean in several respects, but due to the 
difficulty *n its computation It is not widely used for this 
purpose. 

hit Shifting, Splicing nod Deflating 

Isa Ski/ling, Many times it become* necessary to shift 



tmWX NUMBERS 


489 


the base of a series of Index Numbers from one period to 
another. For instance, let a series of indices, say, of cost of 
living, have 1949 as its base and its value ir» 195*2 and 1960 be 
150 and 300 respectively. Let another series of indices, say, of 
production* have a base 1952 and its value in 1960 be 200. From 
these figures one may conclude that as the change in production 
from 1950to 1960 is of 100 points (200 ~ 100) and the change in 
cost of living is of 150 points (300 -150), the change in the tatter 
serttt is greater. But this conclusion is not correct as the two 
series have different base periods. To have valid Comparisons 
it will be necessary to correct the cost of living series into a 
new series with 1952 as the base year, ?.<*., the base of this series 
should be shifted to 1952. 

The best method of base shifting which will give correct 
results is to reconstruct the series with the new base. This 
means that for each year relatives corresponding to each com¬ 
modity included in that index number are recomputed on the 
new base and then averaged out. This new average will give 
the appropriate index number. But tins process is very 
lengthy and may not lie possible to apply in all cases. Another 
method may be followed, which gives nearly the same results 
when arithmetic mean is used for averaging and gives exactly 
the same result as the first method when geometric mean is 
used for averaging. 'Hie method is as under ; 

Divide each index number of the scries by the index number 
of the lime period selected as new base ami multiply the result 
so obtained by 100. The figure thus obtained will give the 
required series with the new base. 


l et us explain it further with the help of art example. Let 
the index numbers for various years with 1939 as base for a 
certain commodity be as follows : 


Year 

1939 

1940 

1945 

1950 

1955 

I960 

In. Nos. 

100 

HI 

126 

F5T - 

~162 

~W 


It is desired to shift the base to the year 1950. 

Let its malic the calculations for 1955, If 1950 is to be the 
new base, Hi index number must be 100, But in the old series 



490 aw iwthoddctiow to statistical methods 


it h ISO and index number for 1955 it 162. The problem 
stated m simple terms it to determine the index for 1955 when 
the index for 1950 is changed from 150, to 100. 
i,e , If the figure for 1950 is 150, the figure for 1955®« 162 

162 

lithe figure for 1950 is 1 the figure for 1955^ v*ui 


If the figure for 1950 is 100 the figure for 1955 


ta,' 


162x100 

150 


« 108 


108 is the index number for 1955 with 1950 as base. 

Similarly the index number for 1960 with 1950 as base 
will be 180 x 100/150, i.e., 120, index number for 1945 will be 
126 X 100/150, i e.» 84 and so on. The new series with its base 
shifted to 1950 is thus ; 

\W 1939 1940 1945 1950 1955 I960 

7?0 840 

Splicing turn Index Number Series. It is usually found that in 
course of time some articles included in an index number 
scries may go out of the market. New ones may come in. 
Their relative importance may abo change. When these 
change becomes sufficiently important their inclusion in the 
index number becomes necessary. As a consequence the old 
series of index number is discontinued and a new series is 
constructed with the year of discontinuation of the first as 
base. This means that we now have two series of index 
numbers for the same phenomenon—one of them coming 
upto the year from which the other begins. Thus the index 
numbers contained in two series are not directly comparable 
for the simple reason that they are prepared on different bases. 
In order to facilitate the comparison these two series are put 
together in one continuous series, i.c., the two scries are solked 
together. The method for doing this is : 

Multiply the various indices of the new series by the index 
number of the last year in the old scries and divide the result 
so obtained by 100, 





IKDEX NTMBERS 


*91 


Let us explain it with the help of an example : 


Year 

1939 

1940 

1945 

2930 

1955 

Series A 

. m . 

120 

.150 

— 


Series B 

— 

— 

100 

112 

136 


Here series A was discontinued in 1045 and in that year a 
new scries was started. It is desired to splice the two series, 

Let us make calculations for 1950. 

When in. No. for 1945 is 100, In. No, for 1950,112 (given 

by series B) 

I jo v150 

/. When In. No. for 1945 is 150, In. No, for 1950* ..j^p- 

-16# 

168 become the index number for 1950 in the spliced 
series. The two scries spliced in this way give the result as 
follows : 

Year 1939 1940 1945 1950 1955 

Sphccd Scries 100*0 120 0 150 0 ifi8*0 204 0 

Sometimes instead of carrying series A forward, series B may 
be brought backwards. In this case every figure of series A is 
divided by the index number of the year in which change takes 
place and the result so obtained is multiplied by 100. In the 
present example the two series spiked in this way give the result 
as follows : 

~ Yea7 1939 1940 1945 1950 I960 

B0 oT "”h)Q 9 1120 136 o" 

DtjUtimg. Deflating meant making allowance for the effect 
of changing price levels. Over a per iod of time wages may be 
rising. But side by tide the cost of living may also be increas* 
ing. The real wages in this case would be km than the money 
wages. To get the real wage figure one may reduce the money 
wage figure to the extent the prices have risen. The rise in 
price in this case may be Ixrst represented by cost of living 
index number. If the cost of living index number itt a certain 
year is double the base year figure then real wages for that 













492 AH INTRODUCTION TO STATISTICAL METHODS 


year (wage* in term* of the price level as in haw year) would 
be half the money wage*. This process of decreasing a figure 
with the help of index numbers as to allow for change in 
the price level is called deflating. In deflating only that index 
number should be used which is appropriate to the given caw. 
In the above example if one decreases the actual wages in the 
same projxntion as the rise in gold prices, the correct real 
wage* will not be obtained. 

The method for deflating a series of figures to the base 
year level of a suitable index number series is to divide the 
figures corresponding to various time periods of the given 
series by the corresponding figure of the index number series 


and multiply the result 

so 

obtained by 

too. 

The 

example 

given below will illustrate 

it further : 




Year 

1949 

1950 j 

955 

1957 

1961 

Wages per rnont h (Rs.) 

120 

125 

150 

170 

215 

Cost of Living lit. No, 

100 

105 

130 

142 

208 


It is desired to deflate the monthly wages by the cost of 
living index number. 

Let m calculate the figure for 1955. In tins year the wage* 
are Rs 150 p.nu and co»t of living index number is 130. To 
get the deflated income one has to proceed as below ; 

When index of cost of living is 130, wages »Ks. 150 

150 

If index of coit of living ivas 100, wages *•* j^ <100 

Rs, 115 4 approx, 
oi a v 100 

Similarly the defla ted income for I % 1 • 2i>8 

Rs. 103*4 


Thus deflated incomes for various years arc thus : 


Year 1949 

1950 

1955 

1957 

m\ 

Deflated income (Rs.) 120 

1190 

115.4 

125-3 

103 : 4 







INDEX NUMBERS 


493 


EXERCISES 

1. Dr fine Index Number and show the importance the use *>f gene- 
rat index numbers. 

2. What are economic barometer* ? Show their importance in fore* 
i ant mg economic event*, 

Dricribc briefly the problem* that are involved in the construction of 
an index number of prim. 

4 Diaiingutah between fixed bale and chain base methods of constructing 
index numbers and describe their relative merits and dements, 

3. Describe briefly the various method* employed for constructing an 
index number of prices, 

6, What do you mean by reversibility of an index number ? Which 
index mini hers are reversible * 

7, What is meant by weighting in Statistics ? What are the various 
w ays of assigning weights in the construction of index numlrris i* 

8, What is meant by value weights ? Describe with an example the 
weighted number of wholesale prices. 

9, What do you understand bv Time Reversal Test and Factor Reversal 
Test , v 

lb. Explain Fisher’s '‘Ideal’ method of weighting index numbers and 
describe the difficulties that ate to hr fared in using it. 

11 Write short hole* on ; 
i a/ link relatives, 
i fa" chain relative*, 
fr* base shifting, 

d Implicit and Exploit weighting. 

12, Explain the use of Index Numbers with the help of the following table 
which gives the average annual wholesale prices of Jute in Calcutta in 
rupees per bale of M I In for the period 101-4 to 1930. 


Vear 

Rupee* 

Year 

Rupee* 

Year 

Rupee* 

1914 

Vi 

1920 

m 

192b 

f» 

1915 

54 

1921 

94 

1927 

76 

19lt> 

67 

1922 

88 

1928 

7) 

1917 

56 

1923 

70 

1929 

71 

1918 

72 

1924 

76 

\m 

50 

1919 

uri 

1925 

112 




( BX:m , CsktrO*, t <*.$$} [4 1 ) 

hi. Compare the Irwin* Number* of sales of two commodities A and B 
bg taking (if 1925 as Isaac, {tit average of first three years as bate, 

tiii) 1935 as twwe. Sale* given in roaunds. 


Year 

A B 

Sale* 

Year 

A » 
Sales 

Year 

A B 

Sale* 

1925 

70 

60 

1930 

75 

71 

1935 

80 

75 

im 

66 

m 

31 

79 

73 

% 

87 

78 

1927 

68 

69 

32 

70 

7* 

37 

m 

79 

1928 

67 

m 

33 

70 

75 

30 

90 

00 

1929 

n 

70 

M 

82 

75 

39 

91 82 

r*-2j 



494 AM IHT*ODCCrrOW TO STATISTICAL methods 


14, Following give* tk fitura taken from the 'Clpiwl Index of Indian 
Industrial Activity' & April 1952. Calculate the Industrial Activity 
Index Number of April 1952 by u*ing (*) ftitnple average, (ii) weighted 
mean, (Hi) limpli geometric mow, (tyi weighted geometric mean* 


Commodity 

Weight* 

In. No. 

Commodity Weight* In, No. 

Cotton 

9 

m i 

Wagon* leaded 24 

142-6 

Jute 

6 

1053 

Cheque 


m*H 

3 

>92*8 

Clearance 20 

94-3 

Pig Iron 

ft 

121*0 

Note* in 


Otttmat 

5 

223 2 

Circulation 6 

163 3 

Paper 

3 

173-9 

Omaumptkm of 

1.35 8 
{3.SJ 

Coal 

1 

178*6 

Electricity 7 

Front the following data calculate a price index (or the year 193ft by 
u*i*g aimpke geometric mean. 

Commodity 

A 

H C D E 

F 

Av. Price 

>930 bate 

year 16 1 

97 151 56 IW 

100-0 

Av. Price 

1938 

14 2 

$•? 125 4* 134 

1170 


Now rfv«w the prorn*. taking 1918 a* hew vear end 1930 at 
current year mud show that the two mult* ere strictly coniiitent. 

(JJ.fW, B«aott, 1951 ) [ 4 . 4 } 

Uk Calculate front below the In No. til WHO and 1945 with year 1939 a* 
bear by using for method of averaging ii, arithmetic mean, fti) geo¬ 
metric mean, (iii) median. 

AJboahow that Index Number calculated on the ban* of arithmetic 
mean it m>t reversible w hile In. No calculated on the hast* of geometric 
mean t» tevrrwblr. 


Commodity 

A 

it 

c r> 

K 

F 

G 

Price* 1939 

3-2 

4 4 

24 6*0 

10 

84 

HI 

Price* 1940 

Hi 

5 5 

;H> 60 

09 

63 

M 

Price* 1945 

b-4 

4 4 

1*2 2*0 

3*0 

21 

40 


17. From the following group average pike* jumre Index Number* with 
a view 10 determine the amount of wage*. Take 1913 at the bate and 
give four group* weightage in proportion of 8, 5. 3 and 2. 

(Price* in Rupee*} 


Group unit 

19U 

19*4 

1913 

1916 

Fond per md. 

4-0--0 

4-0—0 

5—0—0 

(>-"—0—0 

Rent per rtwwn 

2—0— 0 

2-0-0 

3-0-0 

4—0-0 

Cloth per yd. 

O-Jfi-O 

0-4-0 

0-12—0 

0-12—0 

Mite, per unit 

2 4K-4) 

2-8—0 

3—4—41 

3-4-0 


Abo calculate the index number by using geometric mean for the 
method of averaging. [4 b] 

lg. The following are the group index number* and the group weight* of 
an average wot king claw family budget, Gorottruct the cost of Ik ing 
In, No, by arntgutef the given weights* 




Group 


index numn 495 

Food Fuol & Clothing H. R««t Mac, 

Lighting 

In, No. for 1942 352 220 TSO 160 190 

Weight* 46 10 820 12 15 

(4J) 

19. The following table give* the price* of 8 commod it ir* in the bate year. 
Find the unweighted Index number of price* of the current year a* 
ako the weighted index number, weight* being proportion*) to tlie 
value. 


Quantity itt Untit Price* per Unit 


Commodity 

Bate Year 

Bate Year 

Current Year 

1 

2,692 

64 2 

72-3 

2 

831 

119 8 

HI 5 

3 

1*247 

598 

439) 

4 

183 

57 3 

67*8 

5 

345 

MM 

96-5 

6 

8.993 

109 

19 6 

7 

06 

14 l *0 

im 

0 

1,298 

18 2 

212 



M Go*., thlh «, 

fp) (4.8) 

Calculate the weighted Index 

Number of the current 

>car in Ex. 19, 

using quant»*«•* of bare year at 

weight*. 


Prepare the 

Index Number of year 1953 with 1939 ai base. 


1939 

195:3 


Commodity 

Price* per Unit 

Price* per Unit 

Quart lily in 

R*. A* P 

R*. A*. P. 

Unit* 

1 

8 0 0 

12 0 0 

100 

2 

6 0 0 

7 8 0 

25 

3 

5 0 0 

3 4 0 

10 

4 

4« 0 0 

52 0 0 

20 

5 

13 0 0 

lt> 8 0 

65 

6 

19 0 0 

27 0 0 

30 




f4*9j 

Calculate the index number 

of price by fli UuKyiti' Method* 

*2,i Paatcbc' 

'* Method, (3* Marshall’* method. - 4, Marshall and Edge- 

worth Method, and (5) hither * 

Method from the following data ; 


1935 

1945 

(•omutoilitv 

Price Quantity Price 

Quantity 

A 

4 

30 10 

40 , 

H 

% 

10 9 

2 i 

C 

2 

5 4 

2 


f4W 



496 AH INTRODUCTION TO STATISTICAL METHODS 

23. Calculate Fiiberi Ideal Index Number from the following data for 
Cl) Cotton, (u) Jute, and '»**> Jure and Comm taken together, taking 
1931 m bate. 


COTTON JUTE 


Year 

Prim per 
Unit 

Value* of 
Tm»l Unit* 

Price* per 
Unit 

Value* of 
Total Unit* 

1991 

70 

m 

62 

120 

32 

42 

m 

59 

111 

31 

3b 

m 

.56 

91 

34 

50 

275 

74 

\m 


[Hifit ; Determine quantities by dividing value* by price*.] (4 11] 

24. 'f aking 1923 a* the base period calculate an Index Number nf price* 
for ttoe year 1931 from the following data given m appropriate unit* 


n. 


2«v. 


Commodity 

1923 

Quantity 

Price 

1931 

Quantity 1‘iice 

Wheat 

562 

170 

632 

72 

Rite 

535 

192 

1% 

70 

Sugar 

619 

195 

926 

95 

Cither 

128 

187 

255 

92 

Fuel 

542 

I«5 

632 

92 

Gold 

217 

150 

314 

180 



,R Cam., 

Dtlhi, t*) 5 ?) 

(4*2] 

Calculate the Index No of price* using weighted relative 
taking /») ba*e year weight*, • ii'i current year weight* for the 
Exercise 22. 

method 
data of 

[4.13] 

Given the following data which Index 

Number 

would you u*e for the 

purpose of comparison ? Give i eatont. 



Year 

Rice Wheat 

Jo w ar 

Price Qty. Price 

Qty. 

Price 

Qty. 

1927 

‘>*3 100 6*4 

H 

5.1 

5 

1934 

45 9 37 

10 

2 7 

3 




Af d,, t$V) 


27. Prove using the following data that the Factor Reventai Test i» »atit- 
bed by Fisher * Ideal Formula for Index Number* 


Commodity 

Rase year 
Price 

Rase year 

Q»y 

Current year 
Pro* 

Current year 

Q)y 

A 

6 

50 

10 

56 

0 

3 

100 

2 

120 

c 

4 

60 

6 

60 

D 

10 

30 

12 

24 

£ 

b 

40 

12 

3ft 


(DM, , 9 33) O-OJ 



INDEX NUMBERS 


497 


211. Using the dit« of Exercise 27 shove show that Fishers method smit¬ 
hes both the time reverts! test and factor reversal test and no other 
method satisfies both the tests, 14,15] 

29, Use the following data of industrial production in India to compare 
the annual fluctuation in Industrial Activity by Chain Hate Method. 


Year In. No. Year In. No 

Year 

In.. No. 

1919-20 

120 1924-25 

137 

1929-30 

162 

1920-21 

122 l925-20 

136 

1930*31 

149 

1921-22 

116 1926-27 

149 

1931*32 

160 

1922-23 

120 1927-28 

156 

1932-33 

160 

1923-24 

120 1928-29 

137 

— 




{M.Cmt, Luting , tgff:. 

Assuming 1919-20 to be 100 show that Chain Indict* in respect 
of 1919-20 calculated from the above data arc amr as Fixed ft**'* 
Number* «»iven above. (4. Hi} 

a) Calculate the link between relatives from the following data of the 
labour Bureau Working Class Cost of Living Index Number* of 
Delhi. 

Year 

In. No. 

Year 


In, NV 

1944 

100 

1949 


132 

1945 

103 

1950 


132 

im 

108 

1951 


M2 

1947 

122 

1952 


143 

1948 

132 

1953 


HO 


,'b) Calculate fixed base index number from the following series o 
link relatives. 

Year 1950 1951 1952 1955 1954 1955 1950 

Link Relatives 100 115 120 93 102 156 03 

31. You are given the following senes of Index Number* of price of four 
commodities and nn index number of the four taken together based on 
average. Calculate new indices for seven year* based on the Chain 
Method. 


Ankles 


Year 

Sugar 

Milk 

Coffee 

Tea 

Total 

Avei age 

1921 

81 

77 

119 

55 

332 

630 

1922 

62 

54 

128 

02 

326 

61 5 

1923 

101 

67 

111 

100 

402 

1005 

1924 

93 

75 

154 

96 

410 

104 5 

1925 

60 

43 

165 

88 

356 

890 

1926 

60 

44 

159 

89 

352 

860 

1927 

62 

47 

139 

64 

332 

030 


l/ac. Ait., t$3& H J7] 


32. Calculate the chain index number and fued base index number from 
the following data and compare die two scries 

32 



498 AN INTRODUCTION TO STATISTICAL METHODS 
Commodity Yean 



mt) 

1951 

193 ? 


19M 

A 

2 

H 

4 

y 

7 

H 

3 

t* 

9 

4 

3 

C 

1 

1 / 

20 

8 

16 

1 ) 

S 

? 

18 

n 

n 


H.MJ 

S3. From the dau given Ult.w calm laic the of Wviirg index number 

few the current\eai by tbe aggregate expenditure and family budget 
method separately. 



Quantity 


Price in 

Pi icr in 

Articles 

(Vmsumed m 

Cmt* 

Base tea? 

f 'nrrrnl vrh' 


Paw Year 


Rv 

R*. 

Rite 

3 mdv 

md* 

(> 

8 

Milieu 

S imh 

md* 

4 

3 

Wheat 

l rnd 

rnd * 

r i 

10 

Crain 

1 md 

md* 

i 

n 

Arbor 

1 n»d 

md* 

4 

•f 

Ollier Pultes 

? md* 

rnd* 

"5 

4 

Ghee 

4 seer s 

Verr 

1-23 

2 

(«i|t 

2 md* 

md*. 

2 30 

3 

Salt 

I ’C »erj* 

mds 

4 

'* 

Oil 

24 aee#* 

md* 

20 

2 i 

Clothing 

40 yds 

yd* 

0 

0 r .(i 

Fire WHwi 

10 md* 

imti 

■ i fit? 

0 till 

Krrnaefie Oil 

1 mi 

tin 

1 

t. 

If none Rent 


1 lnii»e 

12 

r» 


M-19) 

34 Af» irw$uuv itilo »br budget of the middle ( tm Unsiip** in a m > in 
England gave thr follow inf mfnrinaiicvn • 


Expense on 

FiHWI 

Hem 

Cbrtbmg 

I )i ! Miw . 


•VVY 

13*. 

2C>\; 

U)\, 20* K 

Price HU* 

0 30 

did 

£75 

d.3 £40 

Puce IW 

CM > 

30 

t>3 

2.4 £13 

W bat * 

bange* m <ihi 

»»f In <f)^ 

. figure* of 1929 a* compared with 


19?® are aeen > 

' &.f «mm ,, / *« £n«»# , t<) 44 14.201 

AY Rewrite the following index number* rum piled by l about Hurrao lor 
err tain cities Ih changing \hr haw to year 1949. 


Year 

Aimer 

Jh*ti« 

CuHtd 

Jabxlpu* 

1944 

100 

100 

loo 

IOO 

1943 

no 

n? 

107 

93 

mi. 

1 Hi 

U2 

JOb 

101 

194? 

IS2 

1.3d 

n? 

l?t 

194* 

1^1 

13* 

1 17 

I N> 

1949 

ltd 


HT 

151 

1930 

Hat 

HU 

It*.1 

13 A 

I9M 

17* 

im 

tMI 

168 

1932 

174 

*73 

»«*> 

130 

*933 

lo* 

m 


131 

fi,22J 



tMH:X NTMBf.HR 499 


36, In 1920 * Ma untie a) Bureau united an index of produnxm baaed r*n 
19M, with the Odlmving rraulu 

Ve*i 19 H Bate WO 1929 

Index 100 I20 200 

In 19:10 the Bureau t erontt roefrd d*c index cm a new plan with 

>.**e 1929 

\ca r 1929 Bate *935 

Index 100 150 

In 1936 the Burra;i a^am manat rttcied the index on yet another 
plan with haar 1935. 

Year 1935 Base 1939 

Index inf> 120 


r 


I; ir required to tplice thew* threi* *r'»-ie» norther to m to flvr a 
mmnujonx »i*ur* with bare 1933 100. Draw up a working table in 

parallel columns ain't .ihov the iriulix for 1914, 1920, 1929, 1933 and 
1039 if nr- An,, i to) \ 4.23) 


The M>Hm* >wj tatde ®i\ rr per capita income and Coxt of 1 tvint: Index 
h»r India f* om )9>9~41S n> 1917 4fl. 


Year 

Pci Capita 

Con of IJvii 

l ncome 

liattr 10: 

1939-40 

67 

100 

1*HU-41 

:tt 

U*5 

1941-4? 

:n 

117 

1942-43 

U2 

KW 

101 '4-44 

139 

217 

1014-45 

139 

216 

19 0-441 

137 

219 

1946-4 7 

) 13 

242 

(9-17-4# 

IHO 

2 Ml 


1 Vflair f he per capita imome with reference io ihr ( <**l of f ,*x ti>wf 
Index and represent the sujniflrd income dia«i allv to tin# 

period. iM,A t A Hah a bad. t i,$ <■ ■ H dH j 

f Hint ; Deftarsijj* tin* mono income with tHVrrnre u> the com of 
living index muohn mean* to calculate the income cormpotuline in 


uwt of living index rt(u*J to 100* 
xi/,, cm! of living index mmsbe* In 


>h. Ihepurr mdr\ number »..<! pine* 

ps tCTX ax l.wixe. 



Wheat 

l*r year 

10 *em 

2nd vear 

9 

4fd year 

9 .. 


1012.43 i* 160, income ! V.1 

W deflated 100 

- 70 

for «h»ec xcai* with the a\rm;r 

Rate per Ki«prr 
Cotton Oil 

4 irm 3 *m> 

H .. i „ 

3 .. 25 

Agr*, i 94 * j {4«25J 



500 AN INTRODUCTION TO STATISTICAL METHODS 


99* Tb* following figures tbsrw i be imports of cottoir piece goods into 
India from Great Britain during 1913*14 and a few post-war yean. 
Find fa) I tidei Number of quantity, (b) Index Number of values, and 
(c) Index Number of pikes using the figures of 1913-14 as base. 

Quantify Million Yds. Value Million i 



1 



i 




2 

1 


JR 

u 

2 

1 


r 

8 

> 

?! 

oS 

il 

s* 

j.§a* 
2* * 

>-S 

4l C 

f 5rH 

White 

Blear 

3fe* 

ISIS-14 

1.334 

793 

m 

17 0 

9-5 

119 

IMS- SO 

926 

471 

m 

1 i 7 

100 

11*4 

IM0-S1 

365 

272 

24b 

52 

4 7 

.VI 

1SSI-32 

249 

2 GO 

223 

2 9 

40 

3 6 


Ignore the effect* of exchange variation between the two countries. 

i/ C V.. t<y0) [4,261 

40. Calculate the index number of prices from ihc data of Exercise 22 by 
using weighted relative method. Show that Paine he’s method gives the 
same result as obtained by using weighted harmonic mean, weights 
being the current year values and l<a«peyre*’ method gives the name 
reavfi a* obtained by using weighted arithmetic mean, weights being 
the base year weights. [4.27] 

41 Following data relate to construction of index numbers of industrial 
production From 1934 in (hr index t hemkals are to be included and 
from 1956 onwards pig iron is »o t** replaced try non-ferrous metals. 
Construct a suitable series which can be used to compare changes in 
production for the various years. 


Camnoditisi 

Weights 1952 

1953 

1954 

1955 

1956 

1957 

1958 

Canton 

9 

4 

5 

5 

4 5 

4 

6 

6 7 

Juts 

6 

3 

4 

15 

5 

55 

4 5 

3 6 

hterl 

5 

2 

2 3 

28 

2 9 

3 0 

3 2 

4 


6 

4 

4 8 

5 

6 

7 

«. 

— 

Cwenf 

5 

7 

7 3 

7 7 

8 1 

8 5 

8’6 

9 

Chemicals 

3 

— 

—. 

4 

45 

5 

5*8 

59 

N/wi.ferrous Met si* 4 


“ 

- h 

0 

i 

75 

[4.28] 



Chapter 18 

Correlation and Regression 


T he statistical methods that have been due tinted to far 
were concerned with the description and analysis of tingle 
variables. In this chapter, we dewribe and discuss the method* 
that are employed to determine if there exists any relationship 
between two variables and to expiess this relationship numeri¬ 
cally. 

When in a given group of individuals measures of iwo 
characteristic* of each individual are obtained, if may fre¬ 
quently be observed that the two measures of each individual 
have a tendency to occupy almost the same relative position in 
their respective distributions. Thus if measures of heights and 
weights of the students of a class are secured it will be seen 
that if the height of a student is considerably above the average 
height, his weight will alto l>e considerably above the average 
weight and if the height of a student it below the average, 
his weight will also tend to be correspondingly below the 
average. When this kind of phenomenon is observed we 
say that the two characteristics are mutually related or 
correlated. 

Thus two variables will be said to be correlated if an increase 
in one variable is on an average accompanied by an increase 
(or decrease) in the average of the other and correspondingly 
a decrease in one variable is on an average accompanied by 
a decrease (or increase) in the other. Simply stated, cnrrela* 
lion is said to exist wheft me two group* or senes of items vary 
together directly or Inversely. 

Positive sad Negative Correlation 

If higher values of the one variable are associated with 




502 AN INTRODUCTION TO STATISTICAL METHODS 

higher value* of the other variable* or when lower values of 
the one are Accompanied by the lower values of the other (i.c,, 
when the movements of the two variables arc in the same 
direction) it is said to be positive or direct correlation, e.gthe 
greater the radius of the circle, the greater will l>e its circum¬ 
ference ; the higher the dividend declared by a company, the 
higher will be the market price of its shares. 

If, on the other hand, the higher values of the one are 
associated with the lower values of the other (i e., when the 
movements of the two variables arc in opposite directions), the 
correlation is said to be negative or inverse, eg, the income of 
individuals and the proportion of income spent on food is likely 
to hr negatively correlated, 

DfgtM i if CwrtlaiiQ*. Between two observed phenomena, 
the relationship may range all the way from no relationship at 
mil to a lehttionship so close that one is inclined to think that 
one phenomenon is the function of the other. The circuit!* 
femm e of a circle increases in a perfectly definite ratio with an 
increase u> the length of its diameter, or the amount of lighting 
bill increases in a perfectly definite ratio with an increase in 
the number of units consumed. These are the cases where 
correlation is perfect and positive. Correlation will he per* 
fectlv negative if an increase in one variable is accompanied by 
a decrease, in a prifetily definite ratio, in the other variable. 
In the case of gases obeying Bow ley's law the volume varies 
inversely with the pressure {at constant temperature); if the 
presgurfc be doubled the volume will be halved The instances 
given above arc not those of variables from social sciences. In 
correlation, of variables from social sciences, however high it 
may be, an increase in one variable need not always be ac¬ 
companied by a corresponding increase (or decrease) in the 
other variable. It is only on an average that an increase in 
one is ;w wmpimril by a corresponding increase or decrease) 
in the other. 

There ran be instances whrte no correlation exists. If* for 
rxamnlc, wr compare the number of cars registered with the 



CORRELATION AND REGRESSION 503 

total number of births recorded during a period of yean we 
will discover that the two variables are in no way related. 

Again there may be caret where correlation exists only 
to a limited extent. Correlation is said to exist to a limited 
extent when a change in one variable brings about a change in 
the other variable, but the change in the latter bears no 
definite ratio to the change in the former. If the acreage 
under wheat increases its yield may also increaae but not 
necessarily in the same latio. If the acreage under wheal 
increases, that under cotton may fall but not necessarily in the 
same ratio. Thr former is the example of limited positive 
correlation and the latter of limited negative correlation. 

Thus correlation may be ; 
l; Perfect positive, 

02) Limited degree of positive, 
i T) No correlation at all, 

'4? Limited degree of negative, 

(j) Perfect negative. 

When we find perfect positive relationship existing between 
two variables we designate ii as - I; perfect negative relation¬ 
ship is described as 1 and no relationship as 0. Thus our 
observed result must vary in between 1 and ■«; 1. 

Causation and Correlation 

The presence of con elation between two variables does not 
necessarily imply the existence of direct causation, though 
causation will always result in correlation. Correlation may 
be due to any one of the following factors : 

(i> One VatiabU bmg the Cause oj the Othir. As to wh$h is 
the cause and which the effect is to be judged from the 
circumstances of the case, e.g., in case of quantity of money in 
circulation and prices, the former is the cause and the latter it 
the effect. That variable which is the cause is called 'subject* 



504 Aft INTRODUCTION TO iTATISTICAl* METHODS 

or independent variable and is usually taken as x. The one 
which is the effect is called 'relative or dependent variable and 
is represented by y. 

(ij) Bath Variables betng the Remit of a Common Cause* It 
may sometimes be observed that the correlation that exists 
between variables is due to their being related to some third 
force, e.g., the positive correlation between the yield per acre 
of rice and of jute is due to the fact that the two are related to 
the amount of rainfall 

(iii) Chance. It might sorncirnes happen that between 
two variables a fair degree of correlation may be observed 
when none exists in the universe. Such a correlation is known 
as sputinut- While interpreting the correlation coefficient it is 
essential to set if there i* any likelihood of any relationship 
existing between variables under study. If there is no likelihood 
the statistical correlation observed is meaningless. 

Method* of Studying Correlation 

1 The Scatter Diagram t'Giaphio Method 

2. Pearson’* Coefficient of Correlation (Method of calcu¬ 
lating one numerical value ) 

3 The Coe flic ie nt of R a n k Cor re la t i cm 

4. The Regression Line (Method of determining relation¬ 
ship between variables) 

i. The Scatter Diagram 

T he Seattez Diagram ma» be described as that diagram 
which helps u* to visualise the relationship between two pheno¬ 
mena The rules ol plotting the dots are the same as were 
disc ussed in the chapter on Graphic Representation, except 
with the difference that here we do not take at the point of 
origin the zero values of x and > variables, but the minimum 
values of the variables given in the question. Table 18.1 
shows -the number of workers einpimed and the annual turn¬ 
over of 2d factories. 



CORRELATION AKb REGRESSION 503 

TABLE 18 J 


Factory No. .? 

So. of Workers 

X 

Annual Turnover 
(thousands of rupees) 

jf 

1 

105 

100 

2 

145 

260 

3 

165 

400 

4 

140 

250 

5 

125 

190 

6 

170 

150 

7 

115 

150 

8 

130 

275 

9 

135 

21U 

10 

173 

525 

11 

180 

530 

12 

Uij 

600 

13 

130 

200 

14 

120 

160 

15 

155 

300 

16 

NO 

120 

17 

160 

350 

18 

145 

350 

19 

135 

300 

20 

165 

250 


1 he data of table 18-1 are plotted in fig. 1BJ. On the 
t-axii we have taken the number of workers and on the j-axis 
the annual turnover in thousands of rupees. Each dot in the 
figure shows the number of workers on the x-axnand the annual 
sales in thousands on r-axi* In the case of factory no. 8 we 
plot a dot at 150 on the .v*axis against 275 on the>*ax». 

The way in which the dots lie on the scatter diagram shows 
the type of correlation. If the path formed by dots starts from 
the lower left hand corner to the upper right hand comer (as 
shown in fig. 18.1), it means the existence of positive correlation. 




flOfi AW INTRODUCTION TO STATISTICAL METHODS 

On the oilier hand, if the path formed by dots runs form the 
upfier left hand corner to the bottom right hand corner (as 
shown In fig. 18/2), it means there exists negative or inverse 
correlation. If the dots do not have any clear direction there is 
no con elation at all between the observed variables. Fig. 18,3 
shows inch a situation. 



Fig. 18.1 

When all the dots lir on a straight line from the left band 
bottom corner to the right hand upper corner, this is the case of 
perfect positive correlation. On the other hand, if the dots lie cm 
a straight line from the left hand upper corner to the right hand 
bottom corner, it is indicative of perfVrt negative correlation. 1 

1 TH« path of the #<x* in the scatter diagram wav ukc the «bsp« of a 
curve. In that ««*c the correlation will be curvilinear, For the purpose 
of this booh 'we am concerned with linear correlation only. 




















CORRELATION AM> REGRESSION 


507 













508 Alt IHTHODUCTIOrt TO STATISTICAL METHODS 


Thu* » scatter diagram enable* ui to know at a tingle glance 
the existence or absence of relationship between two variables. 
Ill it method, however, cannot indicate the extent of correla¬ 
tion, The scatter diagram it sometimes called ‘Correlation 
Chart’ for the simple reason that such charts provide a method 
of finding out the existence of correlation. 

s* Karl Pearson’s Coefficient of Correlation 

The coefficient of correlation measures the degree of corre¬ 
lation existing between two phenomena. A good measure of 
coefficient of correlation is one which supplies the answer in 
pure number, independent of the units in which the variables 
have been expressed, and also indicates the direction of the 
correlation. As we have seen the scatter diagram shows only 
the type correlation and not the extent of correlation. In 
the cate of regression line discussed later in this chapter, 
regression coefficients tell us the number of units of change in 
the ^variable which will accompany a change of one unit in 
the x-variable. 

To determine the exact degree of correlation and direction 
of corMatkm. Karl Pearson’s method is the most satisfactory. 
Coefficient of correlation h usually designated by the letter r. 
Its formula is : 

Sxy 

f ssa —~—ST— ... 

A 

r ,„ ...... 7 > ■ 

V £x*,£f 

x stands for the deviations of the individual items ol the 
subject from their mean ; 

j stands for the deviation of the individual item* of the 
relative from their mean , 

JV stands for the number of pairs of items ; 

a* stands for the standard deviation of x ; 

a* stands for the standard deviation of jr. 

We will illustrate the application of the formula in the 
following example ; 



'correlation and regression 


509 


TABLE 18.2 

Calculation of Karl Pearson's Coefficient of Correlation 
between Marks in Statistics and in Mathematics 


(1) 

| (2) 

(3) 

SUBJECT 

(4) 


r 

| 



C w 
. - 

4rt 

ij s 

IT. 

l-o 


Js£ & 


Jt t 


jm a# 

* X 

> S n 

i * 

a sr 

1 

65 

-3 

9 

*i 

66 

- 2 

4 

3 

07 


1 

4 

67 

1 

1 

5 

08 

0 

0 

6 

m 

f 1 

1 

7 ^ 

70 

4 2 

4 

8 

72 

4-4 

16 

I otal 



AV 

8 

544 

0 -36 


~) (6) (tTT 

RELATIVES 


f r — 

I 


an V) 

J4 J£ 

<9 * 

X 2 

a* 

C C» 

C <£> 

a e 

E g 

fle/. 

s 

ill 

:ii 

1 W * v' 

! f§:i 

j u * s «*5 

! 3 s w 

■8|ji 
8 

67 

— 2 

4 

6 

68 

... I 

1 

2 

65 

. \ 

16 

4 

68 

-1 

1 

I 

72 

4 3 

9 

0 

72 

43 

9 

3 

09 

0 

0 

0 

71 

42 

4 

8 



V ’ 

£xj 

552 

0 

r 44 

-24 


The formula is ; 

raB £ V 
yl^iy 

The formula needs the computation of the surn of squares 
of the deviation from ,V of the subject variable (A*). Its cal* 
dilation is shown in table 18.2, col* (4). It alto requires the 
calculation of the sum of deviations from V of the dependent 
variable 7\ Its calculations are given in col. (7). The sum 
of^is obtained by multiplying the deviations of X with the 
corresponding deviations of Y and adding these products (with 
due regard to algebraic signs). 








510 AN INTRODUCTION TO STATISTICAL METHODS 


y- *1 . hr 

A o 


r A 

A «* 


Applying Karl Fear son** fommln to our data in tabic 10,2, 
we get : 




24 

V.%,y 44 


■*- 'ww 


In the above example we have used actual mean Ini she 
computation of v and i. Whenever Use means ate in fractions, 
deviations should he obtained from assumed means This will 
save us a lot of length} calculations. When the deviations are 
obtained from assumed mean the ,d*o\e foimuln takes the 
foiUmim; form ; 



Formula (*V simplifies the work of calculation and is. there* 
fore, recommended for use to the students. 

llimimtm : 

Table showing the calculations of the Coefficient of Corre¬ 
lation between the Net Area Sown and the Number of 
Floughs used in different State* of a Country. 





CORRELATION AND REGRESSION 


511 


TABLE 18.3 



NctA’RT.X’S! 

llVIM 

mamsmmm 

[ 

— <— 

States r 0 
a js * 
C 

< .* < 

1 -r 

2 ss 

.5 c 
> c « 
a E v ! 

c 

<— c 

C T 
< 

jt *1 

i 

*>. 

® KJf' 

, 3 fl& 

o O ~ 

* a..E 

i 

c . 

3 c 

> 6 f 

cl E 

c 

w- O 

c - 

£4 

■ 

Product of 
deviations 


A 

V 

4* 

r 

y 

/• 

*y 

A 

359 

r J48 

21,904 

52 

>27 

729 

-4 3 ,9% 

H 

3iU 

99 

9,801 

44 

< 19 

36) 

-f-1,881 

C 

285 

4- 74 

5,476 

12 

J3 

169 

962 

1 ) 

275 

-f- 64 

4,096 

24 

1 

1 

64 

* 

257 

-f 46 

2,116 

35 

! 10 

100 

J 460 

F 

245 

•4 34 

I „ 1 56 

Hi 

9 

81 

-■ 306 

i; 

240 

- 29 

114 1 

46 

*21 

441 

-4- 609 

II 

64 

147 

21,609 

II 

14 

196 

4 2,058 

I 

•15 

163 

26,569 

3 

22 

484 

i 8,588 

J 

23 

- 188 

35,344 

2 

..23 

529 

-‘4,321 

V 

10 

iY- XY : - 
4 1,28,912 


Xr’» XV'- 
' 5 8,091 

Zx'S ■■-■■■ 

1 f >,W2 


i 




15,582 




N /( 128,912 


TT7 

10 


V 


155,820 

10 


j(' $.09! 
20 


JO ) 


I 289J20 16 . / ^*910-"55 


10 


/ x v; 


10 


) 


15,580 

as* ' 7Tc:r:rr.f*vsr^^ 

v 128,910*4 y 3,088"5 
~ antilog [4 1925 1 (5 1106 > 3 48985] 












512 AN INTRODUCTION TO STATISTICAL METHODS 

- »ntilog 14 1025.4 3001) 

antilog T-8024 

r + 78 

There <> another formula alio which can be u»ed for the 
computation of r, viz., 

r; xt i-'.v{ ,Vi : T) 

fiw._ _ : .. 

v iaa* a <*}•) irr--,v<r.}*) 

Illustration \ 

Find *ht coefficient of correlation between A* amt T. 

X I 2 3 *1 5 6 7 H 9 

Y 12 n )3 15 H 17 It) 1 9 18 

Solution : 

TABLE 18.4 


A 

7 

A 3 

t 2 

AT 

1 

12 

1 

144 

12 

2 

M 

4 

121 

22 

3 

13 

9 

169 

39 

4 

15 

16 

225 

60 

5 

H 

25 

196 

70 

6 

17 

36 

289 

102 

7 

16 

49 

256 

112 

8 

19 

64 

361 

152 

9 

18 

81 

324 

162 

43 

135 

285 

2,085 

731 



r 135 t* 





CORRELATION AND REGRESSION 


513 


. r( ,vr)-.vf y); f? 

v'(r.v«- .>1?)*) (i’J iD 1 ) “ 

731-9 x (5 x 15 '' 

V - ;9 v*25)l |^085~ ' 9 5tt.V j 
56 

" v Wxl6 

56 14 

6U 15 

-■933 

Coefficient of Correlation in Continuous Series 

In the foregoing illustrations the given data were either of 
the discrete type or related to quantitative individual obser¬ 
vation. It is possible to calculate the cor flic ir nr of correlation 
from continuous series also. Generally the data of the rnnto 
mums series arc classified in a two-wav fiequencv table. 
Such tables are termed as ‘double entry’ or ‘contingency" or 
‘correlation* tables. When a computation is based upon the 
correlation table we have to make certain assumption relating 
to the distribution of the items within the classes. Kvery item 
which fails within a given class-interval is assumed to fall 
exactly at the mid-value of that class. The errors caused by 
this nature will be, in general, of :i compensating nature ; and 
the net total error will not be significant, provided the class- 
widths are moderate and the series is one for which the 
computation of r is suitable. 

As an illustration we can fake a table which shows the 
relationship between the age of a husband and that of bis wife. 
One would rather expect from observation that an old husband 
would tend to have art old wife and a young husband would 
tend to possess a young w ife. If this it correct we should get » 
positive correlation. The data appear in the following table : 
33 




S14 AH INTRODUCTION TO STATISTICAL METHODS 



25-35 

Age of wive* 

35-45 45-55 55-65 

65*75 

Total 

15-25 | t 

1 



4 . « 


2 

25-35 ! 2 

12 

1 




15 

35-4.1 ! ... 

4 

10 

1 


.. 

15 

45-55 


3 

t» 

i 


10 

55-65 ! ... 



2 

4 

2 

8 

65-75 j ... 




1 

2 

3 

Total 1 3 

17 

14 

9 

f) 

4 

53 

Each figure 

in the 

above 

table 

indicate* 

the numbe* of 


married couples where the husband'* age was within the range 
given at the left of the row and the wife** age was within the 
range given at the top of the column. That is, all the couple* 
listed in a row mean that the age of all husbands is approxi¬ 
mately uniform but as we pass from left to right we pass to cases 
of greater and greater ages of wives. Similarly , all wives listed 
in a column have approximately the same age, but as we pass 
from top to bottom we pass to the cases of greater and greater 
ages of husbands. 

The formula for the computation of r in this will take the 
following form : 


£/*'/ 


zfx’.z/y 

jv 



The only difference in the above form of the formula is the 
introduction of /which denotes frequency. The calculation of 
Zjj and SJf % is shown on the right hand side of table 18.5 ; 
and of Zfx atid ZjV x towards the bottom of the table. The 
centtai part of the table fives the computation of Zfxf* 










S16 AN INTRO DUCT ION TO STATISTICAL METHODS 
83 

' V'% - 87 

-•antilog 11"8 83— | (log 964 log 87 j } 
w«niitog | I 9191 —| (I 98234 1 9395) | 

- audio* | I ■9191 1 9609] 

anti log I*95fi2 
r *9082 

CttftckMli »f Correlation for Historical Series 

In the preceding examples coefficients of correlation have 
been computed from the data which exist at a point of time. 
Coefficient of correlation may also be computed for such data 
which are spread over time. As explained earlier the varia¬ 
tions in the long time series are oftrn the result of the long-term 
and short-term forces. Now, our object may be to study 
correlation either between long-term changes of two historical 
variables or between short-time changes only. 

The coefficient method as explained so far, if applied to 
original historical data as they stand, without any preliminary 
analysis, would show the correlation between the combined 
long- and short-time changes of the subject and the * dat ive. 
If we are required to study correlation between long-term 
changes only, it is necessary that our data should be got rid of 
short-time fluctuations. It meant that the correlation is to be 
computed between the trend figures of two varlabels. In 
such a case the point of origin from which deviations are to be 
meaaured will not be the mean of the original figure but the 
mean of the trend values. The easiest and also reliable method 
of finding out the trend is to ascertain the moving average. 1 

In general practice, however, it is not customary to find 
out the trend values of (he data for studying correlation in the 
fang-term changes of two historical variables and the correla¬ 
tion coefficient computed on the basis of original data is taken 
to represent correlation changes between long-term changes. 


1 fine chapter *** Aiw»ly«t* *4 Time Srritt, 



CORRELATION ATM RECRRSHION 


SI? 


JilmtHtiim ; 

The following uble give* the value of export* uf raw cotton 
from India and the value of the import* of manufactured 
cotton goods into India during the years 1913*14 to 1931*32 : 


TABLE 10.6 

In Crore* of Rupees 


Exports of 

mm mgggjiji™ 


Raw Cotton 


1913-14 

42 

56 

1917-18 

44 

49 

1919-20 

58 

53 

1920-21 

55 

58 

J 923*2-1 

89 

65 

IS29-M 

98 

76 

1931-32 

m 

58 

Calculate the Coefficient of Correlation. 

Solution ; 



Calculation 

of the coefficient of correlation lie tween the 

value of the ex 

ports of raw cotton and the 

value of the imports 

of cotton manufactured goods. 



Exports of Raw Cotton 


Year 



Cotton goods 


s 

i I 


kSJ 


0 § 

K'| 

SI 


o a 

SI 



: a- 

V 1 

x * 

1 r 

) i y » 

*y 

1913-14 

42 

16 

256 

56 

2 

4 

32 

1917-18 

44 

14 

m 

49 

—-9 

81 

126 

1919-20 

58 

0 

0 

53 

5 

25 

0 

1920-21 

55 

-3 

9 

38 

0 

0 

0 

1923*24 

89 

4 31 

961 

65 

4 7 

49 

217 

1929-30 

98 

f 40 

1,600 

76 

H8 

324 

720 

1931-32 

66 

4-8 

64 

58 

0 

0 

0 










Sit AN INTRODUCTION TO VTATtSTICAt METHODS 
Applying to the above data the formula, we get : 


'vr^- i£ f]-(V- W) 


i.v. sy 


1,095 

f 


(46x9) 

. r 


Vi 


3,086 


(46) 

7 


h ( 


483 


J9)» \ 

7 ) 


1,095-39 14 

f -Ww: .'-.VW>.U VSMJW: »• 

\/ 2,784x471 

antilog (log 1,035*86 ~l (log 2,784-flog 471)J 
• aniilog (3*0152 -3 0588) 

*« antilog (I — 9564) 

*904 

AgpM t« Start-term Changes 

To study correlation for short-teiro changes, it is necessary 
that our data should be rkl of long-term changes, i c., the trend 
should be eliminated from the original data, Since the original 
figure is a composite of long- and short‘time variations, if we 
subtract the long-time variations (trend) we will be able to 
study the short*time variations independently. In employing 
this method we find out the deviations of the original data from 
the trend values. 

Further the number of pairs of item* (a) is not the number 
of original figures, but the number of pairs of items of trend of 
the subject and the relative. 



CORRELATION AND REGRESSION 


$ 1 $ 


Illustration : 

TABLE I ft. 7 


Showing the computation of the coefficient of correlation of 
the short-time oscillations between supply and price. 




Supply 




Price 

! 


Ye at 

iw. 

j C ^ 

■ £ 

!1 t: 
r*, 

u 

% 

V 

>■ */ c 
xr WC 

C n 
l > J- 

f*> £ v 

r* £ > 

r* c Ad 

2f 

«- — 

§| 
5 K 
> £ 
Jr 1 0 

—( & 

1 s 

&«U. ~ 

2 » 0« 
0 * > 

5 *■§ 1 

W. 

0 

I.B 

C w 

- 

8 

tc 

H 

x 0 
h £ 

! 

WO 

• .5 

§ % 
fc/5 g 
ffl * 
w r ff 

> * § 

WK?iJ»MA3p 

_J_° b s 

^ « 

^ c 

w.2 

11 




r 


•4 




1 v * i 
; / 1 

■Kf 

1945 

BO 





146 





194ft 

82 

83 

1 


1 

140 

! 43 

-. 3 

9 

•03 

194 7 

87 

87 

0 


0 

143 

139 

4-4 

16 

40 

1948 

92 

88 

44 


10 

134 

136 

2 

4 

-8 

1949 

B5 

87 

-- 2 


4 

131 

135 

- 4 

16 

4 8 

1950 

84 

85 

l 


1 

HO 

139 

4 1 

1 

.... } 

1951 

80 

88 

*•■■■ 2 


4 

146 

137 

4 9 

HI 

- 13 

1952 

94 

93 

+ i 


1 

125 

HO 

- 15 

225 

. 15 

1953 

99 





149 




.... 






~iv~ 




""T"” 




n 7 



«se 




!?»■ 

,- 5 , 






27 




352 

31 


Solution : 

JVumbtt ofpain of obse nation or n 7. 

Substituting the above values tn the Pearson' 1 formula : 














520 AW INTHOnUCTlON TO STATISTICAL METHODS 


w -31 31 

*"v'<ji'304 ' 97-S 


• 31B 


'3IB denotes inverse correlation between supply and 
price for short period which indicates that with an increase in 
supply price falls and .iVr ;./rw. 


Method of Concurrent Deviation 


In such historical variable* where no trends are apparent 
or where the trend is considered unimportant, the method of 
coefficient of concurrent deviations may be employed satisfac¬ 
torily, In a majority of cases this method gives accurate 
result* similar to Pearson*# coefficient with much less calcula¬ 
tion. This method of coinament dev iation is unsuitable for 
studying correlation between long-teim vaiiatiom, as it does 
not consider the general long-run tendency of the data, 

The most important characteristics of this method ate ; 

j The deviations of each item ate measured not Iron* 
the true or assumed mean or moving mean, hut front the 
preceding item 

ip Only the direction* *d the deviation ; i c., positive or 
negative and not the extent ot deviations are considered. 

The steps involved undo this method sue 

j Examine the fluctuations of each scries and find 
whrthei ca'. h item increases or diminishes in comparison with 
the item just preceding. Alt increases are noted as plus and 
all decrease.* as minus, 

(it; Count thr mmibet of items, where increases or dt- 
ri case* concurrently occur in the subject ami the relative. Such 
cases Air considered as comm renrrs. 'I liesr are denoted by 
the letter * 

■of (U i.ou thr i* of items, where an increase in the 

vuhjrc.t tv a«i.nmpanied by a decrease in the relative or met 
emu. Such ciia'i are considered disagreements or uon-con- 


* untnert 



521 


COHftELATION AND REGRESSION 

lastly wc should apply the formula : 

r- ± TjjEl . 

-- n 

where 

i rocfftcicnt of correlation 
c number of concurrences 
n number of comparisons 
To quote King. “The use of the signs requires a word 

of explanation, li the quantity — h negative the sign 

{ is introduced before ii and also before live radical, This 
is necessary in order that the square-root may be extracted 
and the result retain the same sign as that of the original 
quantity.'* 

Wc; will apply the formula to the data shown in the follow¬ 
ing table : 

TABLE 1H.8 

Method of Concurrent Deviations 


Starks in 
Statistics 

Marks m 
Accountancy 

*>■ 

Student j Marks 

Deviations 

from 

[♦receding 

students 

X 

...-J 

Marks 

: 

Deviations 

from 

preceding 

students 

t : £ 

q ! o 

S b 

£ i g 

U <ti 

3 ^ t 

£ f to 

a : .1 

- a 

A 65 


<io 



M -in 

- 

5.5 

-- 


C 33 


.50 


|. 

D 75 

-* 

56 



i: 63 


30 

- 

.4. 

f HO 

-r 

70 

r 

■*- , . 

(i 35 

• 

40 

- 

a- 

H 20 

- 

35 


4. t " 

1 80 

i.. 

00 

-|. 


J b0 


73 


4 ... 

K 50 

- 

00 











522 AW tWTftOmJCTION to STATISTICAL methods 

Thu* we find that 9 cates out of 10 concur : ,Y -■ 10 and 

9 

The formula give* : 



-i ■ v 'f BO 
f* 89 

Therefore there it a very high degree of direct cot relation. 

Graphic Method of Correlation 

Another way of studying correlation it to plot both the 
series on a graph paper, if the curves so drawn run remark¬ 
ably parallel to each other throughout their upward or down¬ 
ward path, it indicates positive correlation. On the other hand, 
if the curves move in inverse direction, it indicates negative 
correlation 

There is a drawback in the graphic method of Boding 
correlation, vise,, that it can only show the mere existence 
of positive or negative correlation and not the exact degree 
of correlation. 

Lag 

When there is a causal relalkitflhip existing between two 
time series it will be frequently tinted that a change in the 
independent variable takes some time to have its effect upon 
the dependent variable. For instance the quantity of money 
in circulation and the cost of living index have a high degree 
of positive correlation. But an increase in the money supply 
will make itself felt on the cost of living index only after the 
expiry of several months. This tendency on the part of effect 



COBBKLATlOtH AND lECMMlOH $>23 

to occur only tome time after the occurrence of the cause it 
known as the 

When ‘lag’ is known to exist it is necessary that a reasonable 
allowance be made for it if a correct coefficient of correlation 
is to be computed. This will involve the determination of the 
‘time lag’, i.e.. estimating the time which a change in one 
variable ordinarily lakes to have its effect upon the magnitude 
of the other variable. This can be done easily by plotting the 
two series on a graph paper and reading the time distance 
between the j>eaks or troughs of the two curves. If the peak in 
variable A comes six months after the peak in variable B we 
can conclude that there is a six month time lag between the 
two variables. 

When once the time lag is determined the neat step would 
be to push backwards the dependent series in such a maimer 
that the time lag is totally eliminated. Thus if an increase in 
the money supply in January results in a rise in the Jiving 
costs in July, the cost of living index should be pushed back in 
such a manner that the figure for July is paired with the figure 
of money supply for January, It is only after the series have 
been adjusted like this that it will be possible to have a correct 
estimate of the correlation coefficient. 
illmiralbn : 

Find the coefficient of correlation between the age and the 
sum assured from the following table : 


Age group 

30 

Sum assured in rupees 

100 200 500 

1,000 

15-24 

IS 

20 

6 

2 


25-34 

21 

26 

6 

5 

I 

35-44 

10 

9 

3 

6 

1 

45-54 

7 

8 

5 

4 


55-64 

8 

3 

I 


... 


f ind also the coefficient of correlation for the ages 15-44 and 
discuss the results. 


(Delhi, &. Cm. t 






up to 44 



CORRELATION AND REGRESSION 


525 



lot* F- I G5225 i (2*4024 : 3 2435) 
- 1*65225 —J (5 6459) 

~ 1*65226 2 6229 
^28293 
/. r- 06750 



52 Hh 15*4 _ 67*4 

V 75 r :: 2 * 2 |(K 58 » - 109 * 3 ) ™^'§7T$f ST 


(og r- I’8287- i(I 66214 3*1703} 

- I *8287 -*■ 1(3 0324) « I 8287 2*5162 
I 3125 
\ *205 










526 m introduction to mnwiciL methods 


Correlation for the whok data it very small and for the data 
up to 44 years is quite significant and positive. This shows 
that the correlation between 45 to 64 is significant and negative, 
fo other words, sum assured increases with age up to 44 and 
then decreases. 


3 . The CstAdtst of Rank Correlation 


The Karl Pearson's coefficient of correlation as discussed 
before, cannot be used in cases where the direct quantitative 
measurement of the phenomenon under study is not possible, 
for example honesty, efficiency, intelligence etc. fn such cases 
one may rank or array the different items and apply the Spear¬ 
man's method of rank differences for finding out the degree 
of correlation. The formula for computing rank correlation 
by the method is : 


R* I 


6 £D* m 

j^A* I) 


where R denotes coefficient of Rank Correlation, 

D denotes the differences between paired ranks, and 
,V stands for the number of pairs. 

!Rmb ti t urn : 


A group of ten workers of a factory is ranked according to 
their efficiency by two different judges as follows : 


Name of Worker 

Judgment 
ofJudge A 

Judgment 
of Judge B 

A 

4 

3 

B 

8 

9 

C 

6 

6 

D 

7 

5 

E 

1 

1 

F 

3 

2 

(j 

2 

4 

H 

5 

7 

1 

10 

6 

_ 1 _ 

9 

10 


Compute the coefficient of rank correlation. 


-“tr"———- - 

* This formula may be derived from Karl fotmn'i formula for wdR* 
ckm of corretathm. 





CORRELATION AN0 RE0RESS10H 52? 

Solution : 


Computation of the Rank Correlation Coefficient 


Name of Worker 


X 

— 35- 

Ri - R$ 

TJr- 

A 

4 

$ 

[ 


B 

C 

8 

6 

9 

6 

1 

0 

i 

0 

I) 

7 

5 

2 

4 

K 

1 

1 

0 

0 

F 

3 

2 

1 

1 

0 

2 

4 

2 

4 

H 

5 

7 

2 

4 

1 

10 

8 

2 

4 

J 

9 

10 

. 1 

1 


£7>*-20 


6 x 20 
10110* l ) 


The opinion of the two judges with regard to the efffcienry 
of the workers shows great similarity. 

This method can also l>e used where the actual values (and 
not the ranks) ate given. In such cases it will Iw necessary to 
first rank the different items and then proceed in the same 
manner as above. 

Illustration : 

A group of eight students get the following percentage of 
marks in tests in Statistics and Accountancy : 


HotT numbers of of ma rks in % of marls In 


students 

Statistics 

Accountancy 

U 

50 

.■—. ~m. . 

12 

60 

71 

13 

65 

60 

14 

70 

75 

15 

75 

90 

16 

40 

82 

17 

70 

70 

18 

80 

50 


Compute the rank correlation coefficient. 






528 AW IWTWOD0CTIOW TO STATISTICAL METHODS 


SofuUvn ; 

Since we are given the actual mark* am! not the ranks, it 
wiU be nectatary to rank the different values. Ranking may 
be clone either from the largest to the smallest or vh* rtrui. 
Assigning rank* from the highest to the lowest we get . 


Loll No. of 

Rank in 

Rank in 

Difference 

7F 

students 

Statistics 

Accountancy 

/> in Ranks 


id 

7 

. T~ . ~ 

4 

ifroo 

12 

6 

5 

1 

TOO 

13 

5 

7 

2 

4.00 

14 

3 5 

4 

*5 

- 25 

15 

2 

! 

1 

1 *00 

16 

8 

2 

6 

36*00 

l? 

3 5 

6 

--2*5 

0 25 

16 

l 

8 

7 

193)0 




17>* - 

u:T5 ~ 

R \ 

6r/>» 




.V(.V* 1) 





(i 1135 
8 ( 8 * ! 


"I '^4 

It may Ikt rioted that where two in more items of a group 
have the same value, the rank in such a case is determined by 
taking the average of the ranks which these items would have 
occupied had the differed slightly from each other, Thu* in 
the above example Roll No. 14 and 17 get the same mark* in 
Statistics, t.r , 70 each. The value 70 stands third in the rank 
but since it »* repeated twice we w ill take the average ot 3 and 
4, i.c. 3 5. 

Advtalagtt of tfel* method 

1. This method of measuring relationship is easier as com* 
pared to Karl Pearson’s coeffic ient of correlation, 

2, It is the only method of finding out relaikinship when 
ranks <*e given. 

i.. The rank correlation is also used even when the actual 
values are available at it saves time. Since a correlation 
coeBkirnt is not very reliable when the rtutnlte r of items it 






CORRELATION AND REGRESSION 


629 


small; it may be sometimes desirable to make an estimate of 
the degree of relationship by the use of a rougher and more 
quickly computed R. If R is low in value, then the chances 
arc good that the value of r wilt not be high and the calm* 
lation or r may not be deemed worth while. 

Vstfulntss. The greatest use of the method lies in the fact 
that it enables us to measure the degree of relationship in case 
of such statistical series as cannot be quantitatively expressed 
huh! where the coefficient of correlation cannot be used. 

Limitaiiwi. The main limitation of the method is that it 
is nut as accurate as the Karl Pearson'* coefficient of correlation. 
Moreover, for V the number of observations) exceeding 30 
the labour of calculating the rank correlation becomes very 
great. For this tea son the method should not be used where 
A is more than 30, unless the original data ate ranks instead 
of scores. 

4 . Regression Line 

So far we havr discussed the method o! computing the 
degree of correlation existing between two given variables. If, 
however, »t is intruded to calculate the amount of change that 
will normally take place in variable Y for a unit change in X, a 
line 1 will have to be fitted to the points plotted in the scatter 
diagram. 

The standard form of the equation describing a line is 

I 'm 4 4 bX 

When this equation describes the line marking the path 
of the points in the scatter diagram it is called regression 
equat ion and the line it dfscrilies i* called the line of regression 
of 1 on X. 

The values a and h in the equation are termed constants, 
be., their values are fixed. The first constant a indicates the 
value of I when V - 0 Ft is also called Y intercept. The 
value of b indicated the *lc*j>c of the regression line and gives 
us a measure of the change in Y for a unit change in X It is 
also called regression coefficient of i on A* and is written at 
bjr U we know the value of * and b f we can easily compute 

1 Here we are concerned otily with the linear regression 

34 



530 an romoDucTioi* to statistical methods 

the value of V for any given value of -V, The values of a and * 
are found with the help of the following two normal equations ; 

I\r)«.Va-J bl'X 
Ztxr) aZX rbZX* 

tituihuihn : 

TABU'; 1?19 


Number of Worker* Employed ami the Annual Sales 


Factor v 

No. 

No.~r 

Workers 

X 

Annual Sain 
(thousands of 
rupees) 

V 

Vi' 

x a 

V* 

i 

103 

100 

10,500 

11,025 

10,000 

2 

143 

260 

37,700 

21.025 

67,600 

3 

163 

400 

66,000 

27/225 

160,000 

4 

140 

250 

35,000 

19,600 

62,500 

3 

123 

190 

23,7,30 

15,625 

36,100 

6 

I7u 

450 

76,500 

28,900 

202,500 

7 

115 

150 

17,250 

13,225 

22,500 

a 

1 30 

275 

41.250 

22.500 

75.023 

<* 

133 

210 

28,330 

18,225 

44,100 

10 

175 

525 

91.875 

30,625 

275,625 

u 

180 

530 

99.000 

82,400 

302,500 

12 

IH3 

hOU 

111,000 

84 225 

360,000 

u 

J 30 

m> 

26,000 

16,900 

40,000 

u 

120 

H>0 

19,200 

1 1,400 

25,600 

n 

135 

too 

46,500 

20025 

fiO.000 

16 

110 

120 

1 3,200 

12.100 

14,400 

17 

160 

350 

56,000 

23,600 

122,500 

18 

143 

350 

50,730 

21,025 

122,500 

19 

135 

300 

40,500 

18,225 

90,000 

20 

165 

250 

4 ! ,250 

27,223 

62,500 

~ V 

l't > 

i't ■ ■ 

Z n** 

2\» - 

r/ ■ 

20 

2,9|o 

5.990 

*>31.575 

434,100 

2.166,550 


2*y*»-.Va hX \ 
:(»■ --nl'x tlx* 


SubMtiulim* the values form rhr table 18.9, we have 

3,990 20a : 2,910* .(i) 

931,575- iVUJim .(«j 

Multiplyuu, the first equation hy 2,910 and second lay 20 , 

w t have 

17>OOJg#t S8,2^ki-.t 8,468,100* Uii) 

i B.63 l,3i *> - 58..200*4 8 A82,ii0O/> -V) 





531 


CORHELATIO* AN0 REGRESSION 


Subtracting equation (tit) from (i>), we have 
213,90CM>* 1,200,600 




1,200.600 

213,900 


- 5*613 


Substituting the value of i> in equation ft), we have 
rt ^*~516 75 


The full computation of the regression equation is shown 
in table 18,9* This gives us the values of a and A. When 
these values are substituted our regression equation becomes 
r-- 51675+5*6!A" 

With the help of this equation it is possible for us to 
estimate the value of T for any given X . Thus when A’** 115 
r — 516*75*?-5*61 x 115 
128 4 

I he estimated value of T\ (known 7 C ) is given in column 3 
of the following table : 


TABLE IB JO 


I 

’ 2 

3 

4 

5 

A 

7 

r. 

( r- r,) 

! r-r f )* 

105 

100 

72 30 

27*70 

767 29 

145 

260 

296*70 

- 36*70 

1,346 89 

165 

400 

408 90 

-8 90 

79*21 

140 

250 

268 65 

- 18 65 

347-82 

m 

190 

134 50 

5*50 

30 25 

170 

450 

43695 

13 05 

170-30 

HA 

150 

128 40 

2160 

466 56 

150 

275 

324*75 

-4975 

2,475*06 

135 

2*0 

240*60 

30*60 

936*36 

173 

525 

465-00 

60*00 

3,60000 

I8f> 

350 

493-05 

56*95 

3,243-30 

185 

ooo 

521-10 

78*90 

6,225-21 

130 

200 

212 55 

- 12 55 

157-50 

120 

160 

156 45 

3 55 

12*60 

155 

30ft 

352 80 

--5230 

2.7&7HW 

110 

120 

100 35 

19*65 

386 12 

160 

350 

380-85 

30 85 

951 72 

H5 

350 

296-70 

53 30 

2.840-89 

135 

500 

240 60 

59*40 

3,528-36* 

16.5 

250 

40865 

- 158 63 

25,169 82 


5,990 

5,99983 


55,523 10 




532 Jkn mmoDUCTiow to statistical methods 

Tin I toM Error of Estimate 


It bet been shewn above that with the help of the regres¬ 
sion equation it is possible for us to estimate the value of Y for 
any given X . Thus when X (the number of workers employed 
in an undertaking) is 175, the turnover is estimated to be 
463,000 rupees. Now this estimated value of Y is less than its 
observed value (523,000). This mean* that the regression line 
is not a perfect fit and all the points on the scatter diagram do 
not fall on this line. {See fig. 18.4) In othci words, it may 
be said that the regression line will not enable us to make 



estimates exacd> equal to the observed values of the sales when 
the under taking employs an assumed number of worke. v It 
may thus he said that the estimates will be in error. This 













CORRELATION AK0 REGRESSION 533 


error is due to the fact that the variations in T may not be doe 
exclusively to vai iatiom in X There are other farces as well 
which influence the siac of T. 

In order to know as to how far live regression equation has 
been able to explain the variations in J , it is necessary to 
measure the scatter of the points around the regression line. 
If all the points on the scatter diagram fall on the regression 
line itself means that the regression equation enables u* to 
make absolutely correct estimates of the values of T. In 
other words, we can say that the variations in ) are fully 
explained by variations in A’and there is no error in the 
estimates. This will be the case only when there is a perfect 
correlation between A' and T. But if the plotted points do 
not fall upon the regression line and scatter widely from it 
the use of the regression equation as an explanation of the 
variations in t may be questioned The regression equation 
will be considered as a useful device in estimating the values 
of 1 only if the estimates obtained by its help are more 
correct than those made without it. It is only then that we 
can be sure of a functional relationship between X and 1, 

If the measure of the scatter of points from the regression 
line is less than the measure of the scatter of the observed values 
of r from theii mean it can be inferred that the regrets ion 
equation is likely to be useful in estimating T The scatter of 
points from the regression line is called *the standard error of 
estimating J'\ It is obtained commonly by the following 
formula : 




7 


xrr-r ,) 1 

,v 


where 5,«the standard error of estimate 
r*»thc observed values of 1 
l , i=« estimated values of f 
vV~ the number of pairs of items- 


Mme acnintr for small samples and the value which is used m analysis 
of variance fc 



534 Aft lftTEODUCTlOft to STATISTICAL METHODS 

The procedure for calculating S 9 may be summarised as 
follow!; 

h With the help of regression equation determine f/» 
as given in column 3 of table 18.10. 

2. Prepare another column (/ i\) 9 find out the 

deviations of the observed values of Y from their 
respective estimated values {column 4 of table 18 JO). 

3. Square the deviation as given in column 5 

4. Sum up the squares obtained hv step 3. 

5. Divide the sum by X and take the root 

In our illustration =»52 69 

It will be observed that the method of computing \ is 
similar to that of calculating <x with the only different r that 
whereas in calculating <s deviations are measured from the 
mean, in the case of S, the Y are measured from the regression 
line (the estimated values of )'). 

iaurpretatloa of the Standard Error of Estimate 

The standard error of estimate (S 9 ) may be imerpreted 
in exactly the same manner as the standard deviation (o) of a 
frequency distribution. Just in the case of a normal frequency 
distribution 68 27 per cent of items lie within a range of mean 
dfc I <* and 95 45 per cent items within a range of mean 
± twice in the same manner 68 27 (approx. $} of the point* 
on the scatter diagram should lie within the belt formed by two 
line! drawn paralic 1, one on either side, to the regression line at 
a distance of I S 9 measured also the axis of Y (See fig. 28.5) 
Similarly within the belt formed by parallel fines drawn at a 
distance of 25* 93 45 per cent of the dots arc expected to lie. 
This statement would be true of all values that Y might take 
within the given range of ) provided the sample is represen** 
tauve of the populat ion . 



AN 



»d Hi »25 135 f*5 ISS >• $ > 7 % 

X~ or vvOffK£fti 

Fig. IB.'» 

£xpiatn«4 a ad Unexplained Variability 

In lhe preceding discussion we have tried to r\ plain the 
variance of the dq>emlent variable J with the help of the 
regression equation. If all the points on the scatter diagram 
had fallen on the regression line we could have said that the 
entire variance of I* was due to variations in X . The measure 
of the scatter of points away from the regression line gives us 
an idea of the variance in ) which we have not been able to 
explain with the help of the regression equation. 












536 AN INTRODUCTION TO STATISTICAL METHODS 


r »■* 

The variance of cj* ** — {?'}* 

»109 r H27‘5-WOU-25 
«*, 19,627*2*1 

The variation of ) which we have nor been able 10 explain 
k represented by [S 9 i.r,, {52*69)* 2,77b 2 approx. This 
mean* that the variance which we have been able to explain 
(re., explained variance; is : 

Total variance.'-Unexplained variance 
^1» T 027*25 2,776 2(» 

-16,651 05 

On the basis of the unexplained \ a nance we can compute 
the coefficient of correlation with the help ui the following 
formula : 

j Unexplained \ arianer 
Total Variance 

2,.77tV2t» 

I9,627 t 25 

16 651 h;» . Kxfnamed Variance 
16,627 25* ' Total Variance 

**■ -B56 

i . \ f 859 -625 (appmx.) 

Rtgresilon of X on I 

If wr desire to find -nit the values of ,V for given values of T 
we can do m fa thr help of another regression equation (called 
equation of t egression of A* on $ ) whose from will be as 
fallow s 

A it h V 

l he values of a' and h' can be t * imputed with the help of 
another set of normal equations ; 

rx w, * r> 

L'Xl . 'i'Zl -r bZ)* 



CUMULATION AM) REGRESSION S3? 

Substituting values of £X t £J etc-, we get 

2,910= 20*'T5,9904' ..■<*) 

931 ,57 5«» 5,990fl f +2,186,5504 ’ ...(«) 

Multiplying the first equation by 599 and the second by 2, 
we get 

1,743,090*-11,980 *' + 3,588,0104' -..(«*) 

1,863,150'-* 11,980 «+4,373,100*' ...(ip) 

Subtracting (rti) from {ip}, we get 
1 . 20 , 060 - 785,090 4 ' 

120,060 
b ' 7 85 ,090 

-1529 


Regression Coefficient null the Coefficient of Correlation 

ll has already been explained that if all the points on the 
scatter diagram fall on the regression line the correlation is 
perfect. This is as much true about the regression line of T 
on X as about the line of A" on 7 , This means that if the 
correlation is perfect the regression line of ) on A* and the line 
of X on ) is one and the same. This is because of the fact 
that one and only one line can pass through more than one 
point. If, however, the two lines diverge and intersect each 
other the correlation is not perfect. With the help of this we 
can establish a relationship between regression coefficients and 
the coefficient of correlation. This relationship can lie 
expressed as below : 

t % s*b x h ' 

Substituting the values of h ami h\ we get 
r* 5*61 x 1329 
r- v '3 hl : 1529 
- 923 approx. 

Relationship between b $ft f, T< r, c r , n, 

It can ht shown that b„ regression coefficient of V on X) 



538 Alt INTRODUCTION TO STATISTICAL METHODS 


Mr * iefc * * n< * y rtprCKnt deviation from their retpec* 


live means. 

(a) Regrenion equation ot T on A' i* r**a+h 
The normal equation* are 

Et^Na + 12X .,.{>) 

EXY^aEX +bEX' .,.(») 

If instead of Y and X we substitute y and x (deviation of X 
and Y from their respective meant) our normal equations will 
be as under : 


yy Aa-f bl'x 
Exy^nEx+aEx* 

It has been shown elsewhere that 

2r"x«»0 

Ey 0 

/. our equation (t##) becomes 


a- 0 

equation (ip) becomes 
Zxj^OfbXx* 


j.v 


• ■ {«■£*) 


Thus b is the same as b„. 
Similarly, we can prove that 



(b: The regression coefficient of l on X 


v_| 


Dividing and multiplying the denominator by /V, wc get 
Zxy Exy 



COMKLATIO* AND RECBBSS10N 539 

Multiplying the numerator and denominator by <v we get 

,Ve*, x o # 


, a. 


N<t s ,Qf 


-tSK 


Similarly, we can show that n * . 

Substituting this form of /v in our regression equation it 
become* as follows : 


p-*»r. X«-x 

0* 


(i-n 


(jr— 


cr-rj (jr-X) 

or ■ —*—— ~ r - 

e, o, 

(c) We are now in a positron to show that 
(b mI x A» f ) a(ixi') — r* 

"** mf5 V v 3 

£*>• 

*'»“ iy 

t . iV.£*? 

O $t ****--jr? yy 


(r Vi » 


tfg)* 



540 AN INTRODUCTION TO STATISTICAL METHODS 


M) 


( ISL ) , 

\ JLJ.Zy' ) 


JZ*. 


11.xy 

or tat* — 

yjl&A iy 

In case of a perfect correlation b^ should be the 
reciprocal of 6 Jr 

For a perfect correlation r«* 1 but ^^bmg-hbn 

/. (D* 



The Poist of lattmctiaa of the Two Regrettioa Liata 

That a regression line pauses through % F point can be 
shown in the following manner : 

Regression line of Ton X as |Hown already is : 

(I. f)^X ;S) 

If we take the value. X to be equal to X our equation 
becomes as under ; 

<T.r>»*o 

I r 

Thu* we can say that when A ’«X the value of 1 will be 
equal to F, be., there is a point X, F on the regression line of 
f on X In the same manner we tan show by taking the 
regression equation of A' on T as : 

that here also is a X t F point. 

Thus we have seen that there is an X, T point on both the 
regression lines* Since there can be only one such point on the 
scatter diagram it follows that this is the point where the two 
lines intersect. We can thus conclude that if we solve the two 
regression equations as simultaneous equations the values of A* 
and i that we would obtain would respective!} by the values 
of X and T 




541 


TOlPEtATlON AND RFCRF&SK^* 

Illustration : 

Find the mo*t likely price in tsorabay corresponding to the 
price-of ft*. 70 at Calcutta from the following data : 

Average price at Calcutta 65 

Average price at Bombay 67 

Standard deviation of Calcutta price 2*5 

Standard deviation of Bombay price 5 5 
r i J fl between thr prices in the two towns, 

\M.Com. t Agra) 

Selulwn : 

If X represents the price at Bombay and T at Calcutta the 
price at Bombay can be found by the help of the following 
regression equation : 

(.V F) 

' 7 » 

or <X 67: = B-y~ {70 65) 

or .V- 67 *s 5*6 
or .Y-67 4 5*6-72*6 
72‘6 is the most likelv price in Bombay. 

HUntnvii:<n : 

In a partially destroyed laboratory record of an analysis of 
correlation data the following results only are legible : 

Variance of V — 9 
Regression Equations : 

HX .107 i 66- 0 

40 A'— 18) 214 

What are 

(a) the mean values of A and f\ 

(b the coefficient of correlation between A'and 7 , 
fei the standard deviation of / ? (/,AX ) 

Snlatiim : 

a. If the two given regression equations are solved as 
simultaneous equations the values of A ami 7 obtained will be 
!C'*]«r»tvcly the values of Yand F. 

BA-103- 66 .(i) 

40A^ iBT-214 „ 



542 AH INTRODUCTION TO STATISTICAL METHODS 
Multiplying the (t) by 5* we get 

40jr-sor»- m .(*«o 

4<hV~!8r-2l4 . [ip) 

321 544 

r -17, Le.*r«1/ 

Substituting the value of A* in equation (i), we get 

tt<V — 170«** - 66 
or 8A' 104 

or X-** 13, i.e., Tt- 13 

( ,J ) '~vT r 7xl7, 

We shall have to determine the values of these two regression 
coefficients for determining the value of r. From the question 
itself, however, it is not clear as to which equation gives the 
regression of V on A'and, therefore, we assume the first equation 
as giving the regression of 1 on A' and the second as giving 
the regression of X on T . On this assumption the first of the 
regression equations given becomes, 

10 1 8X4 66 

v , a A* 4 66 

< v t„, .- 1() 

or Y ** HA" *- 6’6 

A the regression of V on X , ir, h tr '8 
The second of the regression equations will take the form of 
40.V*181 T 21 4 

v 181*4-214 

or A arr. J 

to 

or X *45)>5 33 

A the regression of A' on I, if., b, w - *45 

A r^\ /: 8x‘ ; 45^ v'SM 
*6 

(It may be incidentally mentioned that if we had assumed 
the first equation as the regression of X on Y and the second of 
Ton X then the value of t would have been more than unity. 
Therefore, we have to assume the way we Have done.) 





CORRELATION AND RECK ESS ION 


m 


(c) Calculation of the itandard deviation of T 
0 * 

5>u)bi timing the value* of r and g, (v' § given), we grt 



or 

or 

or 





544 AN INTRODUCTION TO STATISTICAL METHODS 

EXERCISES 

1, Diicuss fully what t» meant by correlation and dUfittgm»h between 
IKniiiw and negative correlation 

2. What it correlation ? Explain bow you will u*e the following methods 
in determining correlation : 

(i> Graph, fil> Gw relation table, and • iii) Karl Peartnrt'* Coeffi¬ 
cient of Correlation. 

' fi.C***** A a) 

3# Write short ftntri on : 

(i) Positive correUiion, 

(i») (kiefltttettt of concurrent deration, 

(id) !-*«> 

4, Two variate* A' and f when expressed n% deviation* fmut their respec¬ 
tive mean* at r given a* follow*. Tint] die coefficient of < orcMkirrm 
between them. 

X.m ■ -4 -3 -t .1 ft -f-l 4-2 4 3 f 4 

I"-4-3 .3 --4 0 -*-4 4 I 4 2 2 I !<■/?} 

5. The following table shows die maM* obtained i.y JO «tt>dem-< m 

AertHimauuy and Statistic* 

•Student* No. I 2 3 4 5 6 7# <1 Id 

Anminumv 45 70 <i f > 40 W to 30 75 H5 M> 

Ktamtir* 35 90 70 40 95 40 00 DU HO 50 

f ind the coeffkieni of correlation, 

(l) fae;- , /Uftafni, *</',/ f0 r ‘} 

fi. Deline correlation and find the coefficient of « mrelation lietwren the 
sale* and ex pen vet of the following Id firms . 

Firm* 1 2 3 4 5 <i 7 ft 9 10 

Hales 50 M) 55 10 1*5 05 tv- M> fit) 50 

Kxprntei Jl It M If. !o l'i 15 14 13 13 

ft Urn., fi l , t 9i S (t. 7j 

7. Catudair ibr roeflioeni of cot*elation betwern the valor of X and i 

giwn Ijw low : 

V 711 no % 69 59 79 01 

)* 125 137 1% 112 10' 130 123 UM 

l/ic nd working mean Ini V and 11? a* that for >*. 

Af.d . Pe/Ai, /95.V? 10.4} 

ft, (a) Drlsitr tlw oidfirifnl of correlation. Mow would you lodge iu 

reliability ? 

ill) The following ndhlr gnr* the mark* obtained by 12 student* in 
two d die rent tuhjm* in an rxatmnxtion : 

Roll No i 2 3 i 5 o 7 8 9 U) II 12 

M*lh« mafic* 3h 5o 41 4b 59 46 o5 31 i>H M ?M 3f* 

licnn^twii * 62 48 09 53 Vi 30 4 2 t*h 44 3d t»3 71 

CaK uUtr the coefficient of correlation Draw a graph show mg 
tlw icUtmmhip Itelweefi tlie mark* in the two mbteet*. 

■;R.f,aw., Diffo, t.95 J : 10.7} 

9. Calculate the valor irf corllWient *4 tort r latum from the following 
table giving demand and pnee of a certain luxury goutlt. ; Figure* 
Arbitrary.; 



CORRELATION AND REGRESSION 


545 


Price m Rt. 2 4 6 10 U> 

Demand in mtk 66 40 30 6 

Calculate a bo the probable error of the rneffirient of correlation 

|6.6J 

) 0 . I'he following table give* thr values of I' calculated from the perfect 
relation T**( [o— X )*. Calculate the value of Karl Pearson’* coefficient 
of correlation for X and T. Why is your result different from 1 ? 


* 1 


2 3 

4 5 

6 


r 25 


16 9 

4 1 

0 


Calculate the probable error 



The following 

table gives the frequency according to age group of 

marks obtained by 

75 students in an intelligence ie*t : 





Age* 


Total 

Test Marks 

19 

20 

21 22 

23 


0-20 

4 

4 

2 1 

1 

12 

2b-40 

3 

5 

4 2 

2 

16 

40-60 

3 

6 

0 5 

3 

25 

60-80 

0 

4 

6 8 

4 

25 

Total 

10 

19 

20 16 

Iff 

75 

Calculate 

the 

coefficient 

of correlation 

between age and 

•nielli- 

genre. 










d., Raj„ t # 55 ) 

16.131 


12. The following table gives the number of candidates obtaining different 


marki in economics and statist its 




Mark* in 


Marks in .Statistics 


Total 

Economics 

30-40 

40-50 

504)» 

60-70 


30-40 

3 

1 

1 

— 

3 

40-30 

2 

6 

1 

2 

11 

50-60 

1 

2 

2 

l 

6 

60-70 

— 

1 

1 

t 

3 

Total 

6 

10 

5 

4 

25 

Calculate 

the coefficient of 

correlation between marls 

in two 

ttihjmt. 









(ff.Csflt-, 

, Dilhu ;o.S5) 

MU 

Find the coefficient of correlation between the 

agr and sum 

assured 

from the following table : 





Age Group 



Sum Assisted 




50 

100 

200 


1,000 

13-24 

18 

20 

6 

2 

— 

25-34 

21 

26 

f» 

5 

i 

35-44 

.10 

9 

3 

6 

t 

45-54 

7 

11 

5 

4 


554*4 

8 

3 

1 

— 

— 

Total 

61 

66 

2 ! 

1? 

2 


si: 



546 ah umiODocnoH to statwticai. methods 


Find also the coefficient of correlation for age IM4 and diicun 
the result*, (B.C -»„ Mki, > 95¥ ) (6.14) 

14. Prom the following table calculate the coefficient of correlation ; 



X 

3 -2 

-1 0 

+ 1 

4-2 

4-3 


-3 





3 





8 

3 

4 


-1 


9 

7 

4 



0 


2 3 

9 

3 



+ 1 

2 

3 6 





4-3 

3 3 






4-3 

4 











[6 10] 

15. 

Calculate the coefficient of correlation 

between 

the 

mark* obtained 


by 60 student* in the terminal ( X) and 

Annual 

(T examinations in 


Trade and Staititici', 








X 





r 

* $ 

s g n 

? 

f 

Total 



«N cn 

»n S 

»- 

cc 



21*30 

4 




4 


31*40 

3 3 

11 



19 


41-30 

2 

10 8 



20 


31*60 

6 

10 5 



21 


61-70 


4 3 



7 


7140 


2 

2 

1 

5 


81.30 



3 

1 

4 


Total 

3 n 

31 17 3 

5 

2 

60 





(fl.Com., 

Hr (6.22] 

16. 

From the 

folio* in* table calculate the coefficient 

of correlation 


between (X and 7 ■. i 

) and <} and (A' and £)• Calculate the probable 


error in each ca»e and interpret your results. 




X 

1 

2 3 

4 


5 


r 

5 

4 7 

2 


6 


< 

3 

2 4 

0 


3 







[6*27] 


17. follow inf are the ranks obtained by 10 student! in two lubjecu ; 


statistics and mathematics. To what extent it the knowledge of students 


in the two subjects related ? 
Statistic* 12 3 4 

r» 

e 

7 

a 

9 

10 

Mathematic* 2 4 13 

3 

9 

7 

10 

6 

H 







K*-«M 

18. Ten student* got the following percentage of 
Economici and Statistic* : 

mark* in 

Principle* of 

Economic* 8 36 98 23 

75 

62 

92 

62 

63 

39 

Statistics 84 3! 91 60 

66 

62 

$t; 

5P. 

33 

47 


find the Pearson*t and Rank** coefficient of cot relation. 


(kf.d., Agm, rftjr) {*M6J 



CORRELATION AND REGRESSION 


547 

19. Calculate Rank 4 * cocidem of correlation and Pearson'* tncifkieni of 
correlation from ike following data on height and weight of seven 
student*. Why ii it that two value* ate th«* umt ? Figure* are hypo* 
ibetrcal) : 

Height 58 GO 62 64 m 68 70 

Wefehi 90 81 99 108 126 117 135 

j 6.2?) 

20. Find the Rank 4 * coefficient of correlation for question* 5 and 9 {6 17 j 

2 1. Calculate the coefficient of concurrent deviation fmm the data of 
question* 5. If) and 19. 

22. Mention the rule* for interpretation of Karl Pearson* coefficient of 
corr relation. What it the significance of the coefficient of correlation, 
r, for the following value* hated on the number of observations (a! 50 
and b' m2 

r- -2, 4,9. 

(B Cem., Ages) (6 26] 

23 Calculate the coefficient of concurrent deviation of r 

.v **40 12 

[b\ W.-12 r~ 0 

24 a» If X, ) and £ represent deviation* from the mean of three mutually 

independent variate* having standard deviation* equal, find the 
correlation coefficient between variate* (12* v V and 5*™'4*). 

(IV 4 What will l>e the remit if: 

V 9 **’ 2, 5, l, J-vjr m 3, A** 10, Jr* 1 * 4 and J»c«-2. [6 ?ft] 

25. Find the coefficient of tor r elation, given : 

V * 12 £v'«23 

Jt # »*v92l Jr' w — 24 

35»y-»-43S 1*'*«734 [6.RJ 

26 From the following data compute coefficient of correlation between 
a and j : 



* sertea 

» *enrt 


No. of item* 

15 

15 


Arithmetic mean 

2 5 

18 


Squares of deviation from mean 

136 

128 


Sum of product* of deviation 
of # andj> *erie* fremt their 
respective arithmetic mean 


122 



B Ctim., Delhi 

. '05* 

[6.9’ 

(a) Coefficient of correlation between two variates 
Their co-variance ** ?-6- The variance of t is 9. 

r*nd r i, 0 ?fl. 
Find ih« tun* 


dard deviation of r serif*. 

dd Calculate the number of items for which Stan¬ 
dard deviation ofjr^-4 and (6,20) 

2lh Tlie fottowifig table give* the distribution of the total papulation amt 
then* who are wholly m partially blind among litem. Find nut it 
there ■* any cort elation Ik* tween age and M i mines*. 



548 AN INTRODUCTION TO STATISTICAL METHODS 

A >t e 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-60 

No of pmcms 

in '000 100 60 4C» 36 24 11 6 3 

Blind 55 10 40 40 % 22 18 IS 

B Com t Agra, i . 955 ) [&.23J 


n. 


JO. 


31. 


32. 


n 


K 


Cm leu laic the coefficient of correlation for abort time otcilUtioma 4ae» 
tween the indices of import of sugar rx) and export* of oilseed* (jj, 
1947-54 taking a sesen year mov ing average, 


Year 

X 

J 


Year 

X 

r 

mi 

115 

75 


1946 

B0 

50 

38 

IU» 

7H 


4 7 

120 

75 

1** 

114 

80 


48 

126 

78 

40 

103 

60 


to 

no 

00 

41 

J02 

58 


50 

119 

65 

42 

107 

72 


51 

no 

50 

43 

*4>i 

70 


52 

J05 

55 

44 

95 

60 


53 

108 

58 

1945 

f C 

56 


1954 

115 

65 

Calculate Karl Pearson 
equations from the Inllo 

*i coefficient of 
wing data 

correlation and the 

regression 

Age of husband 

m 

19 20 2) 

22 

23 24 

25 26 27 

Age of wife 

17 

17 18 IB 

IB 

10 19 

20 21 22 


III Con., Aitahak*d, |7.2] 

Mean soil temperature and germination interval time between sowing 
and appearance above ground) for winter wheat 1926-2 7 for 12 place* 
are recorded below : 

Mean Soil Temp. S7 42 311 42 45 42 44 40 46 44 4 3 40 

Number of day* 10 26 41 20 27 27 19 IH If) 31 29 33 

Obtain the regression equations uf germination interval on mean 
mil temperature aind «omnium on your result 

fH.Ctm., Dtlhi, trjtf) (7.3} 
In the following table are recorded data showing the test score* made 
by salesmen on an intelligence test and their weekly sales : 

Salesmen J 2 3 4 5 6 7 « 9 10 

Test Stores 40 70 SO 60 80 50 90 40 60 Hi 

Sale*i 000 1 J -» 6 0 4 5 5 0 4 5 2 0 5*5 3 0 4*5 3*0 

Calculate *he rrgmstnn line of sales on test twnrr and estimate the 
r.vm probable weekly sale? if a salesman make* a snore of 70. 


‘14 f*. 1.94^ I7.CJ 

find r»vit the equation* »-f two regression line* from the fallowing data ; 
Average Price Calcutta 65, Bombay 67 

Standard Deviation Calcutta J‘\ Oomtiay 3 5 


f bet wren tbr ttvu prices 0 6 


im] 


Draw two »egression lines on a graph paper. Mean height 5ti07, 
Mew age *i-98, 


Standard deviation of height *-Y 26, 

Standard dev tat run ut age 2*59 and r* t» 890. 


* Af A<> Hat., #$>> (7,5} 

35. Follow mg table give* the age c.f h ml .a ru b am! wives (be 50 newly 

married soupb*. find the *egim%k>n lines. 



CORRELATION AN© REGRESSION 


5*9 


Age of Wive* Age of Husband* Toni 

20-25 2 5-IK) 30-35 

16-20 6 14 . 23 

20-24 9 II 3 20 

24-20 - - 7 7 

Total 15 25 10 50 

[*■#1 

36. Calculate the coefficient ofcorr elation and obi * in the Lino of irgres* 
lion for the following data : 

* f 2 3 4 5 0 7 ft 9 

j 9 8 10 12 H 13 14 16 15 

Obtain an estimate of j which should correspond cm iht avenge 

to vm6'2. tyw {7,7] 

37. Given the following data, find what will be the probable yield when 
the rainfall is 2T. 

Rainfall Production 

Mean 25" 40 units per acre 

Standard Deviation 3" t> units 

Coefficient of correlation between rainfall and production«»Li B 

; M.Cpw., Jjfta 1 l?.13j 

38. A study of 'wheat prices at Hapur a ml Kanpm yield* the* following 

data : 

Hapur kanpi; 

Average Price Hi, 2'463 Rs. 2’7il7 r-*771 

Standard Deviation 12f> *2fl7 

Estimate from the above data the likely pro e of w heat ja at 
Hapur corresponding to the price R*. 2 334 per ruaund at Kanpur ; 
'b) at Kanpur corresponding to the pike of Ri 34)52 per maurid at 
Hapur, ■ Af.,4 , Jgnj, (714J 

39. Given the following data calculate the expected value* of J when 
X~ 12. 

x r 

Average 7.fi 118 

Standard Deviation 3 6 2 5 r +0-99 

U 4 . iysi) \ : l *l 

[Minf. First calculate the regression equation from lbr formula 


fwl .V— X\ '*nd estimate the value of f for given value of .V { 


40, 


41. 


For two variables .V and Y the regression equation of *Y on 1 is 
X r 5 P“ 7, regression equation of t on A'it ? .Yt 17. 


Find the means of A'and Tand coefficient of correlation 

X and / . 


bet ween 

r iM 


In a partially deal toyed laboratory record of an analyst* of corr elation 
data the following results only are legible : 

Variance of .W. 9 


Regression equations are 8,Y- IW-f66^0 
40 *- 187-214 



5S0 AN INTRODUCTION TO STATISTICAL METHODS 


What wrff 

a! i few mean values oi V and T 
. b ) „ ft <A i 

t: ihf coefficient of r ore flat ion bfiwtcn X and T ? 

I) U. f tysf) (7J6j 

12. TV iwi» re-gee** Jin** between heighf X in inches) and weigM iT 
iti lb*. ; of some Mutitn's, atf. 4;- ! 5a ruiO *>0 ami 21U-3 jf-~973«>0. 

find the ftiWJi height and weight of the group. A bn estimate the 
weight when height t» 70 inches and height when weight is 130 lb, 

17,17} 

43, Krgrr.»sK»i of saving .S 4 ot a family over income j'j may fir expressed 

A) > j .1 t * 5 -d/K <1 (otiiUut, \ti 4 random sample of 

Mai f the v the saving* n one quarter < f the variance 

?jf the income and the i.'.>r(!icirnt of onreUtion I wet wren and ijii 
found to V 0 10 ObUn the estimate of m. 

i.VM.. Dtlhi, t<,s $: p.ltJJ 

41, Jot there variable .V, i arr.i .7 following data hat been observed : 

A-."’) 1* 1 V V j .... 4 J 

1 *». ih ’■ 3 2 k f '? 

£ i 20 'v-n 

Mini ihr nrvi»‘ t 4»m; of i, mi ;, itgKiuon on ; on ; and regression 
A A mi x (7,2tlj 

45, 1 rom she data »'f <„£ 44 estimate ihr range within which value of i 

wilt nc when .a .> Ml am.! b : I Mb [720] 

lt> Comment m?j riv fnllnwmg : 

a: The regr r.v-fon «r>e ttfo km htm-d pi^Mnrf on heart weight wit 
much gteitn tiiRii *l*4i M body weight, confommy our belief that 
heart weight pla\ a much largrr pari titan fonly weight in con* 
ti oiling blond p« **«-.;> r. A/ A.> Dtiht, !&$$) 

h 1 he coeffirirn* •*!'con elation rand rand valid i it the >amr, ao 

the rate <>{ fo< iea.te of * with respect u> and with respect U>y i» 

the same. ' 7.23,24) 

4? Th< vahifs obtained u> n»ea>nf <*meni of character A and l on each of 
MM) mdiiidusb led to the table below : 


Central Value V 

10 

VI 

H 

lb 

IB 

20 

No. of AiWivath-m 

40 

90 

m 

Ui) 

ISO 

40 

Average Y 

3 f> 

iy\ 

Hr 7 

7‘2 

75 



Obtain the line** i^gieuion reunion of Ton X. J7.21 J 

4th for T *0 undent* m( a c-laas the irgmsiwi equation of mark* in tlatittics 
‘x • *m marks in actmimancr I}\ is 3T~5.Y-f IHU^0, The mean of 
matfct in Rie«»imtamy it 44 arid the variance of marks in statistics is 
% r '*6 of ihf mi t*nrf of i,aib in accountancy. Mod the mean marks 
of sutiMk* and the a*Bitirm *A correlation between marks in the 
lubjecU p,lt| 



Chapter 19 

Association of Attributes 


fn the chapter on 'Classification' it has been pointed out 
•■that the characteristics possessed by the individual items 
of a data may be classified into (i) numerical, and (it) des¬ 
criptive 

Characteristics which arc capable of quantitative measure* 
merits, e g., age, weight, wages, length, etc., are put in the first 
category and those which cannot be quantitatively measured, 
e.g , sex, religion, complexion, etc,, are put in the second 
category. 

In the former case the observations themselves are quanti* 
tative in character while in the latter case the quantitative 
character arises solely in the process of counting. If, for 
example, we measure the heights of the students of a class such 
measure are themselves quantitative in character. On the 
other hand, if we record the number of students who are tall 
and the number of students who are not tali then the quanti¬ 
tative character arises only in the process of counting. 

Observations based on numerical characteristics are termed 
statistics of variables, while observations based on descriptive 
characteristics are termed as statistics of attributes. So far 
we have dealt with the classification, summarisation, inter¬ 
pretation and correlation of the statistics of variables. In this 
chapter we will discuss the rnetlw»di of determining if there 
exists any relation between different attributes. 

Dichotomy and Notation 

Descriptive characteristics may be classified according to 
the presence or absence of an attribute. In this way two 



532 an introduction to /statistical methods 

distinct and mutually exclusive classes are formed. Such a 

claudication is termed as classification by Dichotomy. New 
it is essential to assign simple notation for the classes and for 
their observed results. 

Attributes are generally denoted by letters, viz., A t B, C t 
etc. An individual possessing the attribute A will be called 
'A\ A group or a class in which all the individuals possess 
attribute A, will be called 'the class A\ Likewise we use 

small letters a, h , t. . or Greek letters a, Ji, 7 .for the 

absence of attributes. An individual who does not possess 
attribute A % will be called *a\ A class in which all the 

members do not possess attribute A, will be called *the 
class a\ 

Thus if A stands for the atttibutt 'male', V would re¬ 
present female ; if B stands for 'married', '/>’ would stand for 
unmarried. Combinations of several attributes will be denoted 
by writing collectively the letters representing several attributes. 
As an illustration, if "A* stand for 'male* and *B* for "married' 
then : 

AB means married male ; 

Ah means unmarried male 
aB means married female ; and 
means unmarried female. 

If we add a third attribute!, literacy, denoted by *C then : 
ABC will denote a literate married male ; 
ah' will denote an unmarried woman who is illiterate; 
and so on. 

Class frequencies are represented In enclosing tlie cor* 
responding class sy minds in brackets. Hence (.1) stands for 
the number of i.e., the number of individuals who possess 
attribute A ; (ABC) stand for the number of those persons who 
possess all the three attributes. 

Positive anti Negative Attributes 

Positive attributes are those which art represented by 

capital letters A, B> C .; whereas negative attributes are 

represented by small letter) a, 6, e,.or Greek letters a, jit, >\ 






ASSOCIATION OF ATTRIBUTED 


553 


Similarly, A, AB t ABC art positive classes and a, ak, *bt art 
negative classes. If two classes are such that every attribute 
in the symbol for the one is the negative or contrary of the 
corresponding attribute in the symbol for the other, they are 
termed as contrary classes, e*g., AB and *0, ,4$ and *£, AftC 
and aify, etc. 

Order of Classes 


The class in which only one attribute is considered is called 
the class of the first order. The class in which to attributes 
are considered is called the class of the second order. The 
class in which three attributes are considered is called the class 
of the third order. 

Thus ; 


A, ff, C, a, b, c 

AB, Ab % aB % ah') 

AC, Ac, aC, at) 

BC, Be, kC,fo) 

ABC , ABt, AhC, 
aBC aBc, ahC y abc J 


..1st order 


.2nd order 


...3rd order 


Similarly, (A) t {AB) and {ABC) are class frequencies of 
the 1st, 2nd and 3rd order respectively. The classes specified 
by attributes of the highest order are called the ‘ultimate 
classes* and their frequencies, the ‘ultimate class frequencies'. 
If we record the three attributes ,4, B, C of the population 
of a particular tow n and suppose that ; 

A stands for male and a stands for female, 

B stands for married and b stands for unmarried, 

C stands for literate and t stands for illiterate, 

.V stands for the total number of population. 

It is clear enough that the total population will Ire equal 
to the total number of men plus the total number of women. 

+ ... (i) 

Similarly, 

JV\k(£)4 (b) ... (m) 

„V»(CM*i ... (iii) 

In (i) we say that the whole population comprises of men 



854 AW IPtTHOBUCTlOFf TO STATISTICAL METHODS 

and women. Now men may be married and unmarried and 
as such the total number of men will be equal to the total of 
married men and unmarried men. Thu# : 




and similarly (a)^{aB) + {ab) 

-W 

Substituting (iv) and (v) in (i/, wc get 


#’*(AB) + {Ab)-HaB)+lab) 

...(vt) 

Now married met* (AB) may \k literate or illiterate and, 


therefore, the number of married men (.*#) will always be 
equal to the sum of literate married men (ABC) and illiterate 
married men (AHe), 

Thu# : 

(AB)**(ABC) + (ABt) 

(Ab) m (AhC) f (Abt) 

(aB) (oBC) r \aBc 

(ab) KW (abC) 4~ abc ) 

Substituting these values in (w), we obtain 

JV«* [ABC) -f (ABc) + (AbC) + {Abc) 4 (aBC) f (a&j + (abC) 

+ (a£r) ...irii) 

From the above we draw the following conclusions : 
jV^Sum of positive and negative frequencies of (an attri¬ 
bute) the first order. See (i). 

or .YwaSum of all the frequencies of the second order* when 
only two attributes are considered. See («). 
or A«$um of all the frequencies of the third order, when 
only three attributes are considered. See (ott). 

The following illustrations will clearly demonstrate the 
above conclusions ; 

Illustration : 

Given the following ultimate class-frequencies, find the 
frequencies of the positive and negative classes and the total 
number of observations ; 

(AB)«125 ; (Ab) «60 ; (aB) «10U ; {ab) **35. 

Sahtiim : 

The classes for which frequencies have to be found are ; 

a; wi. (Bh (b}> w 



Now, 


ASSOCIATION OF ATTRIBUTES 555 

vV* (AB) 4- {Ah) + (mB) 4- (aij (See («*)] 

{A)* (AB)A-(Ab) (See (w)jj 

(«) (ofi) 4 (a^) (See (*■}] 


Similarly 

+ (viil) 

(b) ™[ab) 4 - (» 4 £) 

Substituting the given values ur get : 

A = 1254 604 100 f 35 -320 
(.4)« 1254 60- 165 
{ah- 100-r 35- 135 
{B)~ 1254 100. 225 
(6)60 4 35 -.95 

Thus .V 320, (/!}« IBS. (0)^225, (a) 135, (6W95. 

(For proof A'--- {A) 4* {*) #* (B) 4* (b) 

-1854 135 225 195 
*. 320 - 320) 

illustration : 

In B. Com. 1st year class of a college there were 200 student*. 
I heir results in the quarterly, half-yearly and the annual 
examinations were as follows : 

80 students passed the quarterly examination, 

75 students passed the half-yearly examination, 

% students passed the annual examination, 

25 students failed in the 1st two but pasted the annual 
examination, 

29 students passed the 1st two but failed in the annual 
examination. 

Find how many students passed at least two examinations. 
Svlutidn : 

Denoting : 

Success in quarterly examination by A and failure by a, 
Success in half-yearly examination by B and failure by b. 
Success in annual examination by C and failure by r, 
the given data are reduced to : 




556 AN INTRODUCTION TO STATISTICAL METHODS 


A-200 MUCH 25 

(/!)*«. 80 (abc) =*46 

( 8 ) - 75 (aAC)«42 

(C)-96 (ABt)-‘29 

We have to find out the value of : 

(*BC) 4 ( AbC) 4 [Atit) A (ABC) 

Now, 

(C) -5»(dC) 4 (aC) 

(C) l(dUC) + (dAC) 4- (aUC) 4 (aAC;J 
.% t' dAC) 4 (aBC ); C) ( ABC)~(abC) 

96- 25-42 29. 

Hence [(«£C) r (dAC))t-(dUcH idUC) 

-294 29425 
--‘ 83 Am 

Thu* the number of students who passed at least two 
examination is 83. 

Sometimes with the help of nine squares (given below) we 
can quickly find out the required class-frequencies. 



d 

a 

y 


(dU) 

(*«) 

(8) 


(Ab) 

(ab) 

(») 

.V 

(d) 

(b) 

(A') 


(dU) «> 125 
(afl) - - UK) 
(d*)« 60 
(aA) - 35 



d 


V 

/* 

125 

HR) 

225 


60 

35 

95 

,V 

185 

135 

320 


We gel : 

(.4). 185 
( 8 ) -225 
(a) -135 
(A)* 95 
i-V) 320 



ASSOCIATION Or ATTRIBUTES 


567 


CoaiiitCBcy of data 

A set of given figures is said to be consistent if no class 
frequency calculated from them is negative. As a class 
frequency of any order can be expressed as a sum of two or 
more ultimate class frequencies, the condition for consistency 
of data becomes that no ultimate dais frequency is negative. 

Illustrations : (i) Test for consistency A'— 100, { A) ™ 76, 

'/fh 60, [AB) 

Solution : This is a ease of two attributes. The remaining 
ultimate class frequencies are (Aft) t UB) ami {*$). 
(A$}~(A)~(AB)» 70 15-55 
(aB)-' (B) (AB)**W - 15 -45 
(*/?)« ,V iA)~(B) + (AB) — 100 —70 -*604- J5^. 15, 

As (»/?} is negative the given data is inconsistent. 

(ii) Is the data given below consistent ? 

A - 1,000, (A) 500, (B) **550, (€) -450 

(AB) •-•2(X) f {£C)~250, (dC)« 150, M/fC120 
Solution : There are three attributes. So the remaining 
ultimate class frequencies will he (Ally , (AfiC), (A fly), fa/iC), 
(ally) and ?«£». 

Now (airy) . A r .(A)-*-(A) - {C) 4 M/T)4 (JW)4-( AC )~ [dBC). 

- | ,000 - 500 - 550 450 4 200 -I 250 -f 150 - 120 
1,600 -1.020 20 

As (a fly) is negative, it is not necessary to calculate the 
other ultimate class frequency. As soon as any one of the 
ultimate class frequency comes out to be negative we can infer 
that the given data is inconsistent. 

In problems involving three or more variables this procedure 
for testing consistence will become very lengthy, Tire condi¬ 
tion that no ultimate class frequencies Ik- negative can be 
written in terms of mathematical inequalities which are easy 
to apply. Sometimes in complete data (i.e., data from which 
all the ultimate class frequencies cannot he calculated) may Ik? 



558 AN INTBODVCTION TO STATISTICAL METHODS 

given. In such a case this procedure of testing consistence will 
not be applicable. But the inequalities referred above give 
rise to a new set of inequalities, which may be applicable in 
these cates. 

For three attributes A } B % C the inequalities are : 

(1) (ABC) 4$ otherwise 1 ABC) will be negative 

H) (ABC)^(AB) (ABY) „ „ 

(5) (ABC)+(BC) „ (• BC) #i „ „ 

(4) (ABC)+(AC) „ (AfiC) „ „ i i 

(5} + (iW „ „ 

(6) (dBC)<MB).f(BC)~(B) M (ally) f( M „ 

(7) M«r)<(«C) + {.4C)-(C) „ (a/5C) ,. 

(8) MBC)>(dB) f■MC} + 

(BC)-(A) .(«)-(C)+ A'., (*/fy) „ ,, 

These relations give four new relations for testing inconsis¬ 
tency : 

(9) {dB)-f MC)* (M?)«.4) + (B) +(€)-# 

(10) (dC)-f MB) - (BC)>M) 

(U) (dB)t(BC)-MC)XB) 

(12) (AC) + (BC)~(AB)1>{C) 

illustrations : (i) Can there be inconsistency in the data 

given below : 

"jV- 100, (A) - 40, (/IB) - 38, (AC) - 35, (B) -39 and (BC) -37 

Solution : As (ABC) and (C) arc not given, to relation Nos. 
1 to 9 and 12 are not applicable. Here only relation Nos. 
10 and 11 can test the. inconsistency. Applying No, 10 relatino 
we get : 

(AC)HAB) .fBC)>M) 

or 35 f 38 - 37> 40,.satisfied 

Applying relation No. 11 we get, 

(A8}+lBC)^UCfr!B} 

or 38 \ 3/ — 355b39,.not satisfied 

Hence there is inconsistency in data 

fit) Apply inequality method for testing inconsistency for 
the ilktstratirm No, <«•) given earlier (on page 557) under 
*Cm«»stency of Oats'. 




ASSOCIATION OF ATTRIBUTES 


559 


S*lutw* : Here .V **100, (A)** 500, (B)- 550, (C) —450* 
(BC) =*250,( AC) ** 150, (ABC) «. 120, 

Here all the 12 relations can be applied. By inspection 
relation Nos. 1 to 4 are satisfied. 

No. (5) gives !20<200 4 150—500.satisfied. 

No. (6) gives 120<200 + 250 —550,. .satisfied. 

No. (7} gives 120< 250-f 150—450....satisfied. 

No. (8) gives 120 > 200 +2504-150- 500- 550 - 450 4 1,000. 
or 120> 1,600-- 1,500....not satisfied. 

Hence data is inconsistent. Relation No, 8 not satisfied 
means («/Jy) will be negative and the same result was obtained 
earlier. 

As social ion and Dissociation 

We make use of the technique of correlation to study the 
relationship between characteristics which are capable of 
quantitative measurements. The attributes sex, blindness, etc., 
cannot be measured quantitatively and only their presence or 
absence can be olnerved, It is in such cases that the method 
of association of attributes is applied to determine if any rela¬ 
tionship exists between them. 

In statistics X and Y will be said to be associated only if 
they appear together in a larger number of cases than is to be 
expected if they are independent. The mere fact that A' and 
Y are found together in a fairly high proportion w ill not be 
enough to warrant any kind of association between them. 
What is necessary is that this proportion should lie higher than 
can occur by chance. 

In this chapter we have discussed three methods of finding 
if there is any association between two attributes : 

{*} Comparison of the actual and expected frequencies ; 

(it} Comparison of the proportions of associations of (say) 
attribute A in the universe of B and h ; and 

(«i) Computation of the coefficient of association. 

I. CampariiM of Expected and Observed Frequencies 

Let us first discuss the methods of finding out expected 







560 AW INTRODUCTION TO STATISTICAL METHODS 

frequencies. If we throw a die, we know that the chance of 
our gening an ace is Vr If we throw a single dir 10 times, 
then the expected number of acn V C X 10^*/,. Now this is 
the expectation of ace in ten throws of a single die. Thus the 
expectation is the product of probability and the total number 
of observations. 

Now if we study one attribute (say) A in a universe V then 
the probability of (A) ** —~~ 

Similarly thr probability of (ft) ^ , and of (C) ■ , 

Probability of (A), (ft) and (C) combined 

„ M v (*> v i£) 

A " A " A 

and expectation of (d), (ft) and (C) combined 

A A A A x A 

Attributes A and ft are independent if (Aft), actual obser- 

(A)X(ft) , . , 

vation** -—-— (expectation). 

Attributes a and A are independent if actual observation 

(a) x (A), 

. . • (expectation) * 

J \ 

Attributes A and b are independent if (Ab), actual obser- 

(A)K(b). „ x 

vation ~-jp-■~ (expectation). 

Attributes a and ft are independent if (all), actual ohier- 
vation ■ ■ 4 . (expcctat ion). 



561 


ASSOCIATION Of ATTRIBUTES 


Hence, 

(AB)(*h) *, S* 1 *S& x 

A A 

_ M)X(») Jflxw 

. X"~ jC - 

«(,<*) x(a£) 

Mfl)(ai) *»{Ab){dt) .(««) 

If this test is satisfied it means attributes A and J9 are 
independent. 

Now we take some examples. 

Illustration : 

Show whether A and B are associated or not in the follow¬ 
ing cases : 

(i) .V**50,000, (/*>*23,500, (£j«3l,000, 16, IKK) ; 

(it) (AB) 2,560, («£)«7,680, (Ab) « 480, (*A)** 1,440 ; 

(iii) (/#) —80, (/?) 70, (a)*»20, (A)«!30. (.40)-60, 

(Ab) -20, (a*)»10, (afl) -10 and ,V-100. 

Solution : 


(0 


Expected frequencies of *40- 


M) x (0) 
*. Jf . 


23,500x31,000 


14,570 


But actual observation (AB) is 16,000 which is more than 
the expectation. Hence they are not independent, 

(«) (AB) X (ab) 2,560 x 1,440 3.686,400 
<*b) x{aB )«■- 480 x 7,680 * 3,686,400 
(*40)x(«A)~{.4*) x(iiB) 

Thus <4 and B are independent. 

(*'/i) From the given results we can construct a nine* 
squared table as : 

TABLE I 
Observations 

A et A' 

B 60 10 70 

h 20 10 :to 

V HO 20 100 




562 aw iwrwoDucnow xu statisticai, methods 


Now we find the expectation : 

_ , r ,.„ M)X(fl) »0x70 

Expectation of (AB)™ "~760— 7 56 


(expectation of {ah} 


20x30 
. 100 “ 


VVf may now construct a tabic of expectations as follows : 

TABLE II 


Expectation* 

A a .V 
B 36 14 70 

h 24 6 30 

A’ HO 20 100 

If we compare Table I with Table II, wr find A and B 
are associated, 

/. Expectation is 5b and observation is 60. 

Similarly Ab is also associated. 


Positive and Negative Association 

When actual observation is greater than the expectation it 
means there is positive association. If actual observation is 
jess than the expectation it means negative association or 
dissociation. It should be noted that dissociation does not 
mean independence. In the above example .land B arid a 
and b are positively associated because the actual frequency 
{AB), (ab) (60 and 10 respectively) is greater than 36 arid 6. 
the expected frequencies, 

a * Method of Proportion 

4t lf there is no sort of relationship of any kind between two 
attribute* A and B, we expect to find the same position of 
amongst the B*s and amongst the not-/IV, We may 
anticipate,, for instance ... the same proportion r»f heads 
whether a coin be tossed with <he right hand or the left, 4 *— 

YuU and hmdatl. 




ASSOCIATION or ATTBIBUTES 


563 


fwo such unrelated attributes are said to be independent, 
and we have accordingly a criterion of independence for A 
and B. 

a) 

(B) (b) ." } 

SimJtsirlv .M3L. W and m~<£. 

flj it) {A) {«,< 

(AB‘ ati) 

[A) (a) 

TUr student can easily remember these relationships by a 
table 1 shown below\ 

A t( tihulf B h Total 

A (AB) (AB) (A) 

a (all) (ah) (a) 

Total (B) (h) ,V 

Hlustration 

Find out whrthrr A and B are independent if; 

(A) 4,900, \AB) 2,940, (*)-!>,700 and (aB) 3,ROD. 

Solution : 


Proportion of B in .IV i r,, 


AB 2,940 3 
(A) 4,MOil 3 


. r r , . , (fliSf) 3,800 2 

and nrniw’irtion of B in a s, i.e ,—“ ~ — r.- 

1 (a) 5,700 3 


' M#l. , . (aBl 

thus , ■: is less than ■■■ 

(A) (a) 

there is nrgative ;)«soriation between A and B. 

Him it at wn : 

Out of 70,000 literates in a particular district of India tire 
number nf criminals was 500. Out of 930/HK) illiterate* in 
the same district the numlrer c*f criminals was 15,000. On the 
basis of these figures, do you find any association between 
illiteracy and criminality * 

1 Yule and Kemhttl 




$64 an nrnoDBcnoK to «tatwtical mktbods 
SiMm • 


Denoting illiteracy by a and literacy by a ; criminal* by B 
and non-criminal* by *, the data can be written as : 

(it)—950,000 
(•)- 70,000 
(it*)- 15,000 
(a*)- 500 

Now the percentage of illiterate* who are criminals 

.J^-XlOO- 100-1 6%. 

And percentage of literates who are criminals 

~^* iw -w* ioo - 7% ' 

A comparison of the above two percentages dearly reveals 
the fact that illiteracy and criminality are positively associated. 

$* Gssftcicat o# Asiscistisa 


Like the Karl Pearson’* Coefficient of Correlation, Yule ha"* 
given a simple formula for finding out the Coefficient of Asso 
elation as : 

- id 19) X (* ib ) (Ab) y. (aB i 
^ '"{Aft) <{ak) X(Ab) y («m 

If the attribute* are independent the coefficirni is zero ; if 
the attributes are completely assfxriated the coefficient is t I ; 
if the attributes are completely dissociated it is - l. 

flUstreticn ; 


Compare the association between literacy ami unemploy¬ 
ment in urban and rural areas from the following observations : 


Total adult males 

Literate males 

Unemployed males 

Literate and Unemployed males 


Urban Rural 

25 lakhs 200 lakhs 

10 , t 40 „ 

5 „ 12 „ 

3 M 4 M 



ASSOCIATION Of ArnUBUTEA 


568 


Solution ; 

Lei ui have 

Urbm t Rural 

N 25 200 

(A) 10 40 

(B) 5 12 

(AB) 3 4 

From the above, 

Urban (Aft )***3 

(Ab).(A) (AB)** 10-3^7 
(aB)^(B) (AB) ^ 5-3-2 
(*b) - (a)— (a#) 

**25 — 10—2 
-13 

*4 a .V 

A 3 2 5 

A 7 13 20 

A' 10 15 25 

Substituting these values : 

HZQtt(a*M«N)x^ 

' 4 "~ ( AB) x(iab) i (aB) X (Ab) 

(3 x 13) — (2 x 7) 

"* (3x13)+(2x7) 

25 

53 

f 47 

Rural (AB) - 4 

(JA) -40-4-36 

(a#) «-(J9) — (AN) 12—4 8 

(ab)^A- (J) - (aB) 

200—40 - 8 
- 152. 

A fl A 

# 4 8 12 

* 36 152 188 

,V 40 160 200 



566 AN INTRODUCTION TO STATISTICAL METHODS 

Substituting these value* : 

(4x»52) -(36x8) 

(4 x152) 4 (36 x 8) 

320 

* 8 % 

« 4-356. 

Thui there ii a positive association between literacy and 
unemployment both in rural and urban areas. It is, however, 
more in the case of urban areas than in the rural 



ASSOCIATION OF ATTE1BUTES 


567 


EXERCISES 

1. From Use following data calculate alt (he remaining possible claw 
frequeocifi ; 

A'~100, (A)*50, (A)-70 and (A0)**$O l* H 

2. From the following set of ultimate clast fiequency calculate the set of 
positive claw frequencies. 

(AS)** 10, (Ah) **■ 20, IS and 25 [8*^1 

3. Of 598 n>en in a locality exposed to cholera 147 in all were attacked, 

of 398 men 157 were inoculated and of these only 14 were attacked. 
Find the number of persons not inoculated not attacked, inoculated 
not attacked and not inoculated attacked. |8.$] 

4. In a study of tempers of brothers and sifters it was found ; 

Good natured brothers and good natured sisters 1,2341 

.. .. «va»*n aisters 850 

Sullen brothers and good natured sisters 530 

Sulim ,, and sullen sifters 980 

Find the total number of cases studied, the number of good natured 
brothers, number of good natured sisters and number of sullen brothers 
and tutors, [8.4] 

5. In a certain claw it was found that 80% of students passed in terminal 

examination, 30% students passed in terminal ami annual examination 
while 25% were such who passed in annual but failed in terminal 
examination. Find the percentage of students who passed in annual 
examination, passed in terminal but failed in annual and who failed 
in both examinations. (8.5J 

6. Find all the remaining class frequencies from the following data : 

A**800, M)~244, (ffj-301, (C)- 150, (AB)~ 125, (AC)** 72, (AC.'-flO 
and {ABC)«*n [8 6] 

7. Given the following set of class frequencies, find the remaining class 
frequencies : 

(ABC}** 20, (ABf':** 27, (AFC«39, (Abe)-- 46, 

(«0C) - 37, («&) - 28, («K,j « 33, (ak) - 45 |8.7} 

8. From the following set of class frequencies find the remaining clast 
frequencies. 

AT-1,000, (A)-205, (A)- 115, (C)- 100, fr«Ac>Zl a [AH') ‘18. 

(aBC)*- 17, (ABC}* \2. [8-8] 

9. In a certain college it was found that 380 students passed in annual 

examination, 100 students passed in afi the three, first terminal, second 
terminal and annual examination, 120 students passed In two terminals 
but failed in annual, while 170 passed in annual but failed in both the 
terminals. Find the number of students who passed at least two 

examinations. I®^I 

10. Obtain all the class frequencies in the following example : At an 
examination at which 600 candidates appeared, boys outnumbered 
girls by 16%. Also those passing the examination exceed in number 
those failing by 310. The number of successful boys choosing science 
subject was 300, while among the girls offering arts subjects these 
were 25 failures, altogether only 135 offered arts and 33 among them 
failed. Boys failing m the examination numbered 18. 

ti.AX.fj*r> \9W 



568 AW INTRODUCTION TO STATISTICAL METHODS 


It. In a report the following w»« given : 

50% of item* have character* A 8t &, 35% have A hut not B, 25% 
have B but not 4. 

Show there must be misprint in the data. (fMJ) 

12. In a report on consumers' preference if was given that out of 500 

surveyed 410 preferred variety A, 300 preferred variety B and 270 
persons were such who gave their likings for both the varieties. Is 
there any inconsistency in the data ? [0.12] 

13. Prove that the following data are inconsistent : 

N i.* 1,000, (4)»-525. <B; 312, \A\~ 470. (44) 483, {Ac) -378, 
fjfc)* 226,(4*/*25- [6.10] 

M. A study was made about the studying habits of the students of a err- 
fain university. Following luminary was given at one place in the 
report : 

Of the students surveyed 75% were from well-to-do families, 55% 
were liny* and 60% were irregulai in their studies. Out of the ir¬ 
regular ones 50% were boys and */ 3 were from well-to-do famiirs, Tlie 
percentage of irregular boys from well-to-do families was 8%, 


Show that there is some inconsistency in the data [H.14j 

15. The following is a summary of the itatisiical feature# of a census of 
ration cards : 

Items No. Categories Total No, of cards 

1. The whole of census 1,000 

2. Permanent residents 590 

T Males 490 

4- Consumer* of rice 427 

5. Permanent male residents 100 

f>. Consumers of rkt among permanent residents 140 

T Males consuming rkr 97 


Show that the entry against item No. 1 it inconsistent with the 
entries against all the previous items 1, 2, 3, 4, 5 & 6 taken together. 

1947 ) IB.I5J 

16. <00 pe rsons of London were asked by a B.H.C. investigator \o give the 
nationality of the music they liked. They returned the following 
data : 

570 liked English, 650 liked French, 400 liked German, 440 liked 
English and French. 360 liked French and German, 240 liked English 
ana German and 225 hkrd all the three. 

Show that the information as it stands must be incorrect. (B. 16) 

17, In a certain study of association for three variables following results 
were obtained. 

AT-100, (4)*.40, (ABb *30, (.40 35, (4) .*39 and <0C}«37. 

(’4wt there be inconsistency in the above data ? (0. |7J 

10, Of 1/102 persons in a locality exposed to small-pox 368 in all were 
attacked 

Of 1,402 persons 345 had been vaccinated and of these only 55 
were attacked. 

Can vaccination be regarded as a preventive measure for small- 

tm from the data given above ? 


(B.Gm*, 


i949f [6.19J 



ASSOCIATION OF ATTRIBUTES 


569 


19. 


20 . 


21 . 


22 . 


n 


The following table give* (hr number of persons mitering from certain 
infirmities in U.P. in 1941 : 


main 

[B.WJ 

father 


^ex 

Total No. 

Insane 

Deaf 

mute* 

Deaf mutes 
k insane 

Main 

260 lakhs 

12,050 

21,301 

545 

Females 

241 », 

9,055 

14.13b 

317 


Trace the association between insanity and deaf muteness for 
and females of U,P. separately. 

Calculate the coefficient of association between intelligence in 
and son from the following data : 

Intelligent fathers with intelligent sons 
Intelligent fathers with dull sons 
Dull fathers with intelligent sons 
Dull fathers with dull sum 


2411 
HI 
92 
579 

i,\f.A , n*4» f i«-21J 

From the following figures compare the association between literacy 
and unemployment in rural and urban areas and give reasons for the 
difference, if any : 


Total adult males 

Literate males 

Unemployed males 

Literate and unemployed males 


Urban 
25 hkh* 
10 „ 

3 


R viral 
200 lakhs 
40 

12 „ 

4 „ 


[H.23] 


24. 


\M Hajattkun' l y^j} {W.22| 

In an anti-malarial campaign in a certain area quinine was adminis¬ 
tered to 012 persons out of'a total population of 3 t 24B< The number 
of fever cases is shown below : 

Treatment Fever No lever 

Oumine 20 7m 

No quinine 220 2,21b 

Point out the usefulness of iji.imi.ne in checking malaria. 

M A., Agta, ;,v„y7 

(Calculate the coefficient of association he i ween extravagance in fa then 
and tons from the following data : 

Extravagant fathers with extravagant sons 32 7 

Extravagant fathers with miserly sons 345 

Miserly fathers with exlravagrnt sons 741 

Miserly fathers with miserly terns 234 

( M A* t Aa/aiiAait, 

Dehne independence in probability. Since x-f> are two variables: 
x taking the values 1,2, 3, 4 and y1,2,3, the frequency for diffe¬ 
rent values of* and > arc as follows : 

x 



I 

2 

3 

4 

l 

14 

2b 

22 

IB 

2 

21 

39 

33 

27 

3 

35 

65 

55 

45 


Examine from definition or otherwise,, whether x and y axe inde¬ 
pendent. 

(M-Cm., DM. tm) [B 24] 



570 AN INTRODUCTION TO STATISTICAL METHODS 


23* Cooimem critically on the following statements : 

(A) Road accident! resulted in 4,513 deaths in 1948 and 5,250 in 195". 
while the number of women driven increased in the period 
Hence women make bad drivers. (0.34) 

(b) More surtax-payer* die in a year than the general death rate 

would indicate, thus .t to unhealthy to be rich. [B.33] 

(c) Of the 100 persons inoculated against influenza during the epi¬ 

demic only 12% were affected. This shows a marked improve* 
went compared with the remainder of the population for which 
the equivalent figure was 28%. 

i At.A., Delhi, i $$4 A 55 ) 

2b. Comment on the following itaiemrni* : 

(a) There should be no discrimination in regard to the rules of dear¬ 
ness allowance payable to bachelor ami married gazetted officers, 
because enquiries show that 80% of bachelor government emplo¬ 
yees have aged parrots or other dependents to support. ffi33) 

(to) Non-cultivating land owners should be deprived of their owner' 
ship rights without payment of compensation because 90% oft he 
»and owned by them is inherited property for which they have 
paid nothing. 

(M Cem.> Delhi, rg$y) ('8-34] 

(c) 99% of the people who drink beer die before reaching 100 years 
of Age. Therefore drinking beer is bad for longevity. 

(l.A.S, t 1946} [fl.33; 



Chapter 20 
Interpolation 


I nterpolation rnay be defined as the estimation of the 
most likely figure of a dependent variable form the given 
relevant facts. If we are given two variables .v and y 
simultaneously and if one of them is known to be the function 
of the other, the one which is the function is called the depen* 
dent variable and the other one is denoted as independent* 
A variable is said to be the function of the other if for any 
values of the independent variable say x) wc can always find 
a definite value of the dependent variable (say y). Thus if 
y is the function of x given by x s , the value of J would 
be 25 when x-«5. We generally use y, to denote the general 
value of y ; the suffix x denoting the value of y at x ;(y $ 
denotes the value of y when 

If (as many times happens) the exact function it not known 
and instead wc know some values of y for certain values of 
x the value of y for any given x cannot be exactly deter¬ 
mined. In a case like this we can, at best, make an estimate 
on the basis of the facts that are available. Thus if the 
different values of y for *,, **, and x* are 16, 25, 36 and 
64, we can assume that y M ** (x 4- 3)* and from this the value 
of y for any given x can lx* estimated, e g.. v 4 — 49 or jy- 100. 

This process of estimation is known as ‘Interpolation*. It 
it of great value in statistical work. The median and the 
mode for a grouped data are computed by simple process of 
interpolation. This technique also enables us to make best 
estimates of the missing figures in statistical data. 

Whenever the method of interj>olatior* i* applied it is based 
on the assumption that the variable whose value is to b# 
estimated is the function of the other variable, i.c., there is 




572 AW INTRODUCTION TO STATISTICAL METHODS 

some regular law connecting the two variable*. The law that 
if usually assumed to exist is that of polynomial relation* 
ship. 

The process of extrapolation is the same as that of inter* 
potation and the underlying assumption also is similar. The 
only difference between the two is that whereas interpolation 
refers to the estimation of a figure within the given limits 
extrapolation denotes the estimation beyond these limits. 
If from the population figures for 1901, 1911, 1921, 1931 and 
1941, we are to make an estimate for 1936 the technique used 
is called 'interpolation'* But if it is desired to estimate it 
for any year after 1941 the process will be termed 'extra¬ 
polation'. 

Methods of Interpolation 

The methods of interpolation are : 

(i) Graphic method, and (ii) Algebraic method. 

(1) Graphic method. The graphic method is applicable 
In all types of data and the rules of drawing the curve for 
this purpose are the same as discussed earlier in this book. The 
independent variable is represented on the abscissa or the 
axis of x and the dependent variable is shown on the ordinate 
or the axis ofy. When the points have been plotted they arc 
joined by a straight line. The line so obtained is then smoothed. 
This smooth curve will enable us to determine the value 
of y for any x within the given limits. Thus if figures of 
population are available for 1901, 1911, 1921, 1931. 1941 and 
1951 and it is desired to find the population for 1946, the 
method would be as follows : 

(1} Mark years along the a axis, the point of intersection 
representing 1901, 

(2) Represent population along the y axis ; plot points and 
connect them. 

(S) Smooth the curve. 

(4> Draw a perpendicular from point 1946 on the x axis 
and extend it till it cuts the smoothed curve. 



INTERPOLATION 


573 


(5) From the point of intersection obtained (4) draw a 
line parallel to * axis and extend it to the left till it 
cuts thejt axis. 

(6) This point on the / axis (given in 5) will give us the 
estimate of population for 1946. 

If it is desired to extrapolate for the year 1955 the proce¬ 
dure would be as under : 

(1) Extend the smoothed curve to the required point, and 

(2) Adopt the same procedure as described in the case of 
interpolation. 

Trie graphic method is simple and it gives a broader 
idea of the relationship. But it requires graphic skill and the 
results given by this may vary from individual to individual. 

(it) Algebraic method. The algebraic method may take 
either of the following forms ; 

(a) Parabolic Curve Method, 

(b) Newton’s Method, 

(» Binomial Expansion Method, and 

(d Lagrange's Method. 

fa) Porabolic Curve Method. This method like the gra* 
phir method is applicable in all cases, but may require 
lengthy calculations The curve is expressed by Ihe following 
equation : 

Y a -f hx - f rx* t dx* .. nx* 

where Y in the dependent variable and x the independent 

variable ; a, e, d t . ,. n are the various constants. If 

four values of the dependent variable are known, we take, 
for interpolation purposes, a parabolic curve of four constants 
I limitation : 


The following are the sales of a retail store in Delhi. It 
h required to interpolate the sales for 1950, 


Year 

Sales (in thousands) 

1948 

200 

1949 

240 

1950 

? 

1951 

350 

1952 

400 






S74 AH IHTHODUCTtOW TO STATISTICAI. METHODS 


: 

Since four values of the dependent variable arc known, 
we would take the curve of four constants or (A' I)th order 
(4 -1) i.e» f of 3rd order, 

T'" r a * f■■■ h4" fJi* -4</x* . (P) 

Now the four known value* of the dependent variable 
T would be sufficient to find out values of four constants 
«, h> c, and 4 arid consequently the sales of 1950. The x class 
intervals are equal and we get deviations from 1950. 

x .2 .1 0 I 2 

r 200 240 y 9 350 400 

where y* is the number to l>e estimated. Since all the 
points would be on the curve with equation (P) we substitute 
the above values of x in the equation and get : 

200 * a - 2 b 4 4r ■ Hd (i) 


240^a b f — 4 Hi) 

y 9 ™* a (iii) 

350- a j- b \ t \ 4 f ip} 

400 a t 2b *f 4c 4 H4 (j) 


Equation (lit) tell* us that the value of r 0 --a. We have 
m>W to find out from the remaining four simultaneous equa¬ 
tions the value of a 


Adding ft > a 

nd (a), we have 


2a 4flc* 

200 f 400«* 600 

(«> 

Adding (it\ 

and .»*>), we get 


2 a 1 2r 

240 f 350** 590 

iti 

Multiplying 

{tit) by 4, we get 


itaffV 

2,360 

(«»o 

Subtracting 

Its'< from (mi) 



fia** 1,760 
« jr t 293 3 

The sales of 1930 as intripedaled* therefore, are 293 3. 

I hr above method involves the formation and simplification 
of simultaneous equations and due to this reason it is some¬ 
times railed the method of simultaneous equations. 

(b) .Ynrten's AUikmf f&t Kyaat faiftmh. This method is 
applicable when the indrptntlrnt variable advances by equal 





INTEHFOUTIO W 


575 


intervals and gives the best estimate for interpolation near the 
beginning of the table. The method is fairly easy in calcula¬ 
tion and the students would be able to follow from the given 
examples. 

Illustration : 


The following table shows the expectation of life at different 
ages. You are required to find out the expectation of life at 
the age of 16, 


Age in years 

Expectation in life 

f(\ 

■ • • - ~w~ .. 

Ifi 

33 

20 

29 

25 

27 

30 

22 

.. ^ 

20 

Solution : 


(See table on page 576) 

I.ach entry in the difference columns is obtained by rnktng 

the algebraic difference of the entries on the left, Thus 

-Jt** 33 - 

35 • ■ 2 ; 

’*>V 29 

33 - 4 ; 

I I . 

» 1 - 0 

-4 ' - 2) 1 —2 ; 

' f • 1 • 1 , .. 

» t ~ » 

- 2 4)-* .2 

In this manner,, 

all the entries in this table have been 

calculated. 


The number of differences that w ill be required for this 
purpose can be found out from the last index of r. i.e., \ § , 

The following is the Newton V Formula ; 

y . , , Xt\- 1 

/ - •: ^ y t f >... __ 

1) , , *(* l)(x.2) , , 

1 ./. 2 

“ ) x'fZT' • 

•■*(*. \){x • 2)lx - 

.""'1x2 ' 

Ti , t , *!>■■!)(*.— 4} , t 

' * 4 lx$7*73x3- A • 


where 1# is the figure to be interpolated, x/s ate the differ¬ 
ence* and v is calculated m follow* : 

Year of interpolation - Year of origin 
Time distance between adjoining years 





Expectation Fir ft Second Third Fourth Fifth 


576 AH IWTSODDCTTOH TO STATISTICAL METHODS 




INTERPOLATION 


577 


From the table we find : 

„ 16-10 6 
.>,=35; y 

A*.“-2 ; A\« - 2 ; A%«4 ; a* # - -9 ; A 4 ,«2» 
By sulntt during the above values, we have 
6 


* 35 -K -2) x 


5 


/ 6 V 6 5 \ 

V 5 A 5 " jT’V x 
1x2 


6 

( * 

3 V 6 




, 5 

’• 5 

*’ y a 5 

y ) , 4 



6 

f 6 

l>: 2x3 ' 

5 y 6 

X | 

10 V 6 1.5 > 



-,- V 

V 5 

~ 5 /• 5 

1 \ 2 

“ 5 A;y . $) 

x 3 ■ 4 

) 

X 

- 9 

( 


f 6 5 > 

/ 6 10 \ fi 

15; 

j 6 20 \ 

\ 

1 

5 ) 

V 5 5 * > 

\ 5 5 a y “ 

5 J 

\ 5 5 / 

T 



1v2X3x4x5 



»-4’- 

6 

16 

B1 504 



25 ’ 

125 

625 3,123 



35-2 4 

*24 

13 13 

*■16 




■- 35 3*06 
31*94 years, 

(VW. If tfine are five differences in the table, then the 
Newton's equation should be extended only up to fifth differ* 
rnces order ; if more or lew then up to that difference. 

This method can also tie used for estimating values beyond 
the given limits.) 
lllrntration : 

From the following figures find the premium payable At the 
age of 40 : 

Age in year* 20 25 30 35 

Annual premium R*. 23-3 31-12 35-10 4041 





578 An mmoDi rnaw to statistical methods 
Sthtfm : 


X 

r 

A 1 

L* A* 

/■ 4 


Rs. 

Rs. 

Rs. Rs. 

Rs. 

20 

28— 5 

3 7 



25 

31 — 12 


o 

! 

vi 




3 —14 

0-9 


30 

o 

7 

*n 

tn 


1.0 

0 



4 —14 

0—9 


35 

40 - 8 

6 7 

1-9 


40 

46— 15 




For age 40, 

*-4 




y* -f a / 

• , i 

v: *- 

I - 2 v 3 ov « ^ • 



Rs, 28 

5 0 




13 

12 0 




2 

10 0 




2 

4 0 




Rs. 46 

15 0 




Hence the premium required is Rs. 46-150. 

There is yei another method which can he used when the 
value to he estimated is for the next step beyond the given 
limits. This method consists in obtaining differences in the 
usual manner and assuming the last differences to be constant 
and building up difference* by addition backward* as given in 
the illustration above. The figures at the end of each column 
have been obtained by addition as follows : 

0 added to Re. 0 9 give* 9 annas ; 9 anna* added to 
Rupee l give Re. 1-9 ; R*. 4—14 added to Re. I—9 give 
Rs 6 -*7 ; and Rs. 6-7 vlded to R* 40 8 give the premium 
payable at the age of 40, 







INTERPOLATION 


57 $ 


Illustration : 

Estimate the number of persons whose income is tietween 
Rs. 400 and 500 from the following figures : 

Income in Rs. Number of person 

(in *000) 

0 - 200 120 

200— 400 145 

400— 600 200 

600— 800 250 

800—1000 150 

Solution : 


Iik nmr 

Number of 

/. \ 
i .) 

/. t 

.« » 

4 

& 

npio Rs 

Persons 






0 

0 

120 





;>tvn 

120 

115 

25 

HO 



PM) 

265 

200 

.55 

— 5 

.H5 

—no 

6(H) 

465 

250 

50 

— 150 

—145 


800 

715 

150 

- 100 




1000 

R65 







For income up to Rs, 500 x 


500 -0 

2 (i) ~ 2 


■ i ■ *' v ‘~ 0 A a Jrf.v - I )'x 2> , , 

7 • i n • i 2-3 - 


I){* 2 * %, 


r{x~V>(» 2V»- 

TxTrrrrnj— 







580 AN INTRODUCTION TO STATISTICAL METHODS 


Substituting the values, 

5/5 

5 120 . 2 V 2 

1 


jv* 0+ ~ X 


+ ■ 


1 X i 


x 25 


5 f± 2 V 5 4 v 

\ 2 " 2 A 2 2 ) 


1x2x3 


x 30 


J 


5 ( 5 

A 2 


2 y 5 4 y 

5 

6 

\ 

2 A 2 2 A 

2 

2 , 

x -35 

fx 2x3x4* 



2 V 5 4 y 

5 

6 

V * 8 A 

2 A 2 2 A 

2 

1 / 

K 2 2 / 


1 x2x 11x4x5 


110 


Simplifying, we get 

300M^9 4 9 4 4 14-13 

3 56 4 

Uplo income of Ri 400 there are 265,000 persons. 
Therefore persons with income between Rs. 400 ami 500 
are 356 4 205 914 thousands. 

(c) Hi no mini Expansion Method. 14ns method is applicable 
when the independent variable moves by equal steps and the 
value to he intrtpulated is fur one of these steps. Before 
coming to the actual formula let its introduce a new symbol 
E which is used in this connection, 

/' of any rimy represents t her entry at the next higher step. 

Thu* /)» »t. I St j, etc. 

Just as powers of represent the order of the difference, 
in a similar maimer E can also be used in (tow ers. E n of any 
entry representing the entry at n steps after the given entry. 
Thus /■>!■» 4 *, Eyy-'Yo*',* £ # v* r 7% etc. 

We have Er a \\ ami A Ad 45 h 

A a or r t -Dj^ 

A -Symbolically 

As stated earlier if 5 entries are given, differences upto 4th 
order can be calculated. Rest of the terms in the Newton's 
formula are not used, i e., they are assumed to be mo*. This 




INTERPOLATION 


581 

assumption is same as if we assume that the differences higher 
than the one calculated (5th and above in the above case} 
are all zeros The binomial expansion method is bated on 
this fact. If we are given n entries we will assume that the 
*th order difference is zero, 
i.e., a,* y 0 • U {E~~ l 

{E H ~ *&-'£*“*+.) , lv 0 

£V.- .0 

n(n--l) 

or 1-+-21-.° 

The illustrations given below will explain the procedure 
clearly. 

Illustration : 

Compute the population of 1901 from the follow ing table : 


Year 

Population 
(in millions 

1881 

253 

1891 

2R7 j, 

1901 

? y, 

191! 

313 i, 

1921 

319 v, 


Solution : 

We have to find out the value of jr t 

Since four values are given we will assume /'■. 4 jr i *^u 

.% (£-l)Va~U 

or ( E* 4 E 9 4 6£* - 4£ 4 1 )>« U 

OT >4 th 4 b>* ~ 4y, +jr f 0 
Substituting the values, we have 

319-4 x 315 4 ~ 4 x 287 4 253 •»- 0 

or 319 -12604-6> g -1148 42 53 - 0 

- 1836 
JV«306 

Hence the population of 2901 **306 millions. 







582 AN INTRODUCTION TO STATISTICAL METHODS 


lllusiratum ; 

Extrapolate the cost of living index for 1952 from the 
following data : 

-- 

. 1946 

1947 

1948 

1949 

1950 

1951 

Solution : 

Since six values are known, we have 
or (/•: !)*>«-() 

/, (E* Mr+15E'- 20E* + 1 5E l — 6 £ + 1 ) y Q «= 0 

or + . 20E\+ \5E\~Mj.+j^0. 

or y % - (jj'j f 1-20*, f 15 y, 6> l + l ™ 0 
or y % ~ 270t» ► 6510 9560 f 7065 - 2268 f 328=0 
or v, 2706 6510 + 9560 7065 +2268 - 328 
or j, 631. 

631 is the cost of living index for 1952. 

Jllu'hatmn ; 

Inter|K>Ute the missing figures in the following tabic of rice 
cultivation ; 



Solution : 


Since seven values of the dependent variable arc known, all 
the seventh order differences and higher order differences art 
assumed to be zero. As two values are missing we will require 
two equations. 

Let 0 and / \v t 0 

or (£~ 1 and (E ~ 1 y*j % =»0 


it in Living Index 

.. 328 

378 

471 

478 

434 

451 










INTERPOLATION 


583 


or {E'~7E*+2\E i ~35E‘+35E* 21E*4 7E- 1) y,-0 
and (£’—7E*+2I£»—35EH35E*4-21E» + 7E- lbj-0 
The two equations arc : 

Jh~ 7j»*+21*~35*+35* 2I* ■ 7r, - *0 
and *- 7*4 21*35*435* 21*4 7* * 0 
Substituting the values, 

77-6 - 7 x 80‘6 4 21* - 35 X 78 7 + 35 >■; 77 7 - 2 h, 4 7 

x78-7 766 0 . :: 

7B-6- 7 X 77-64-21 X 80-6-35*4 35 x 78 7- 21 x 77 7 


47* 78 7 -4J (it) 

Solving (i) and (it! equations, we get 

2l*-21**-47\3 (iii) 

35*- 7* *2,272’1 •;<>} 

Multiplying (tV) by 3 

105* . 21*-.6,816-3 . ■ \v, 

Subtracting (iii) from (r/, we have 
84* 6,769 
*- -80 6 


Substituting the value of* in equation tu 
1,692 6 - 21*^47 3 

- 21 * --4 73 1 , 692-6 

21 *'- 1 , 645-3 
*-783 

Hence the rice cultivation in 1913 and 1916 was 70'3 and 
806 million acres respectively. 

(d) Lagrange's Method. When the independent variable 
advances by unequal intervals, this method is the most suitable 
and convenient. The formula is as undn : 

(*-*,)(*■--*).(.s ~ r.) 

(«t~v»)\*r r *r- -i-' i 

+ .. 

+ r 

- * (*Z~*»)(**-**) .(*■ -*-,) . 














584 AW INTRODUCTION TO STATISTIC AD METHODS 


He xty* is the quantity to t>e interpolated, and x is the given 
quantity in the x variable corresponding to which j* is to be 

interpolated, x 9f a s . x m are the given quantities of the 

variable x and j u are the corresponding values of vari* 
able y. 

Jihttiratwn : 


The observed values of a function are respectively I hit, 120, 
72 and 03 at the four positions 3, 7, 9 and 10 of ihc inde¬ 
pendent variable. What is the best estimate you can give of 
the value of the function at the position 6 of the independent 
variable ? 


Solution : 


Independent 

Variable 

Function 


3 x 9 

*63 y 0 


6 * 

■ y* 


7 x t 

120 


9 x t 

72 y, 


10 *, 

63 y t 



Applying Lagrange's formula, we get 




lb# 


(6 — 7) (6.9) (6-10) 

(3 - 7) (3 - 9} (3 -10) 


■f 120 


4’72 


(6-3) (6 - 9) (6— 10) 
(7 3) (7 — 9) (7 — 10) 

(6 3) (6-7 ) (6- 10) 
(9-3)19-7) (9-10) 


, 63 (6 -3) (6 -7) (6.9) 

^ (10 ~3){10—7)(10—9) 


or j>, 168 


(-«) (~3> (-4) 
1-4) (-6) (-7) 


+ 120 


(3) (-3) (-4) 

(4) (-2) (-3) 






INTERPOLATION 


$ 8 $ 


, 72 W (.» (-+). 

T (6) (2) (— 1) 



1) ( 3; 
(3) (1) 


or r t = 12 4-180 - 724 2 ? 

= 147 

Tims the value of the function at position 6 of the indepen* 
dent variable is 147. 



586 AN INTRODUCTION TO STATISTICAL METHODS 


EXERCISES 

L The following table show# the values of an immediate life annuity for 
every £100 paid : 

Age in year* 40 50 00 70 

Annuity (£> 6 1 7-2 9 1 12 0 

Interpolate for the age 42. 

{M.A., Calcutta , tyyO) (9.2J 

2, Interpolate by the curve fitting method the quantity supplied for the 
price of R*. 5. 

Frier 2 I 6 

Quantity supplied 0 IS 35 (9.1} 

3, The following value* are given in a table : 

x l 2 3 4 S 

y 21,600 226,981 - 250,047 202,141 

t T *»ng suitable algebraic method, find (he value ofy for a •>. 3 Abo 
draw a graph of the above point* on a piece of squared papri and 
from thii graph find the value of y for x 4*8 

t/.d.y., im) (9.3] 

4, A life insurance company advertised (he following immediate life 
anmittic* per £100 paid : 

Age in year* 50 60 65 70 

Annuity (£.«,d-> 6.5.0 8.6.0 9,18.0 12,2,0 

By graphical mean* of otherwise estimate the corresponding value* 
for age* 62 and 67 year*. 

iff.Cm,, ft an amt, t 955 ) |9.4) 

5, tind by graphic method and by curve biting method the number of 
persons who have their income* below R*. 175 from the data given 
below : 

Income in R*. below Ml 100 200 375 

No. of perum* in *000 2 8 12 15 

(9 5] 

6 , By constructing the difference table bud the 7th term a* well a* the 
general term of the sequence : 

0, 0, 2, b, 12, 20, 

1/ .1.V.* n m ) [9X>\ 

7* Extrapolate population in 1 %I. 

Year 1691 1901 191! 1921 1931 1941 1951 

Population 165 167 143 126 150 175 291 

l AM , **,*., rm) (9.10J 

6 fur a certain dependent variable y, value* corresponding to imlcpeii- 
dent variable * are given in the table below. Find the value corres¬ 
pond mg to * > 2’5 and 5, 

a v. 0 i 2 3 | 

y .. 4 12 32 76 150 jO.^j 

9. Find the expectation of life at age 16 from tin follow mg figure* : 

Age in year* 10 15 20 25 30 35 

Expectation of life .15*4 32 3 29*2 26 0 23*2 2 tH 

i HilWil fiU, i t ) 19.01 



INTERPOLATION 


S8t 


10 . Extrapolate the cost of living index far 1952 front the following data • 

Year 1946 194? 1948 1949 1930 1931 

Cast of living index 328 37B 471 478 434 431 

11. State New tonformula for interpolation for equal intervals end the 
enumiiikMns underlying it. live it to find thr annuel net premium at 
*ge 25 from the table given below : 

Age 20 24 28 32 

Annual Net Premium 01427 01581 01772 ‘01998 

il.A.S\ y i mo) [9,9] 

12 . hfttimaie the number of candidates who got more than 48 but not 
more than 50 mark* from the following ; 

Marks up to 45 50 55 60 65 

No, of candidate* 447 484 505 511 514 (9.J1] 

13. find out by mteipolation from the following data the number of 
workers earning R.v 34 or more but lei* than Rs, 35 : 

Earning lest than K* 20 25 30 33 40 

No. of workers 2% 599 804 918 968 

[9.12] 

14. From the following data, estimate the number of person* earning wage* 
between 60 and 70 rupees ; 

Wages in rupee* below 40 40*60 60*80 80*100 100* 120 

No.'of pmmu in ‘000 250 120 100 70 50 

i Vf.Om,, Agra, t<j$n [9.13J 

15. From the following data, estimate the number of persons having 
incomes between 1000 and 1500 in group A and B : 

Income below 500 500*1000 1000-2000 2000*3000 3000-4000 

Group A 6000 4250 3600 1500 650 

Group B 5000 4500 4800 22(K) 1500 

{B.Com .» ttjtf) [9.22] 

10. The following table gives the average monthly production of pig iron 
in India over a number of years, Find by suitable interpolation the 
product figures for 1947. 

Year 1942 1944 1946 1948 1950 1952 

Production of ptg 

iron in 000 ton* 150 1 118 8 112 2 117*1 130 4 140 $ 

Discuss the usefulness of your figures as an estimate of the actual 

production in 1947. 1 H. Com., Ihiki, rpjs) [9,16] 

17, From the information provided by the following table prepare a table 
giving estimate of the total consumption of flax in each year, 1941 to 
1948. 

Two years 1941-42 1943*44 1944-46 1947*48 

Prevail average monthly 

consumption '000 tom 2 29 2*98 3 09 3*67 

\M.A., DM> t m ) 19.34] 

16. Determine by l^agrange** formula (be percentage number of criminals 
under 35 year*. 

Age Under 25 30 40 50 

%No. of criminal* 52 0 67 3 84 1 94-4 

> AM,. Agra, [9.20] 

19. The observed value* of a function arc respectively 168. 1250, 92 and 63 
at the four petition* 3, 7, 9 and 10 of the imk}>endent variable. What 
i* the best estimate you can give for the value of the function* at the 
position 6 of the independent variable ? {I.A.S,, [9,21 J 



588 am introduction to statistical methods 


20. Mwm l^eM-2-aiS^ log,, 6M«2’8I82, log M 659».2-0189, log,. 

Find lo| 10 HSb using t wo different interpolation formulae available 
im observation* at unequal intervals, say (^granges formula and list 
formula for divided difference 

J AS, i <>5*v (9.24} 

21. Interpolate tin population fur the year J9U from the following 
cental data of a certain town using Lagrange> method : 

Year J5M l 1921 1931 194! 1951 

Population in '000* 252 241 275 - W 

22. The age of mother and the a\rragc number of children hern per 
mother ate given in the following table, Find by any method of inter- 
potation the average number of children born per mother aged 30*34. 

Age of mot tier 

in years 1509 20-24 25-29 3M4 35-39 -10-44 

No. of c hildren 

born 0 7 2 1 3-5 -- ,V7 5 8 

A /XW/i., Allahabad, U/4^) [9,28] 

23. loteipolate the missing figure in the following tabic with the help of a 
suitable formula : 

m\ 1912 1913 1911 1915 IHttj 1917 

1,331 1,728 2.197 - 3,375 4,0% 4.913 

(AhA. t Ihlhi, , mi (9.29J 

24. Interpolate the missing figure* in the following table of rice cultivation: 

191! 19)2 1913 19J4 ISM 5 I9lb I9J? 1918 1919 

7M* 78 b - 77*7 7B*7 — 80 b 77 b 78 7 

[ALA., Delhi , 

25. The table Inflow give* the profits of a concern fur a few years. Esti¬ 
mate the profits for the year 1953 and 1950. 

Year 1951 1952 1953 1954 195b 

Profit* in lakhs of R». 8*5 12 — 10 

(9.3DJ 

2fi. I He function r U tables, as it should, the values 1, 3, band HI, when 
* equals 0, 1, 2 and 4 respectively. Applying any method of finite 
difference obtain the value corresponding to *■■ 3/ Explain why the 
resulting value differ* from 3* or 27. 

(LAJ'. t /yjfjft 19,25j 

27. Following table gives the value ul x and » v l* Find the value ols 
when a«s4. Why it it dilTerem flout 24, be,, lb ? 

* 13 5 7 9 

>28 32 128 512 (y.KiJ 



Chapter 21 
Vital Statistics 


A statistical study of human population has two aspect*, vis*, 
(I) a study of the composition of population at a point of 
time, and (ii) a study of the changes that occur during a given 
period, i.r., growth or decline of the population. Change in 
the population is the outcome of events like births, deaths, 
migration, marriages, divorces, etc. called ‘vital events'. 

These two aspects of study have given rise to two methods 
for the collection of population data. 

(i) Census taking, and (ii) registration of vital events. 
Vital statistics is the application of statistical methods to the 
study of these facts, and has been defined as “the registration, 
preparation, transcription, collection, compilation and preser¬ 
vation of data pertaining to the dynamics of the population, 
in particular pertaining to births, deaths, marital status anti 
the data and facts incidental thereto. The distinction between 
these two methods of collecting demographic data is that the 
former is a record of persons while the latter is a record of 
events. This however, is not a clear cut distinction, for vital 
statistic* can be obtained from census data, while the composi¬ 
tion of population may Ik* derived from vital events. 

Registration of Vital Facts 

In order that the data pertaining to births, deaths, marriages 
etc, in.iv \w available for population studies it is necessary 
that there should be a nn of compulsory registration of 
these vital even f*, and a cm rial registration office which should 
have a full control over the system and Ire rrsjtfmsible for the 
co-ordination of data. In addition to these other essential 
provisions for satisfactory birth and death registration tire 




590 ATI IMTH0D0CTION TO STATISTICAL METHODS 


(1) immediate registration, (2) the use of standard forms upon 
which entries arc to be made, and (3) a rigid enforcement of 
the law. 

Method* of Analysis of Vital Event* 

The population of a given geographic area at any point of 
time may lie expressed as 

Pit) P(c) +B(t) • D(t) f AO - F{t) 
where P[t) represents total population at a point of lime 

P($) total population at a given point of time taken as base 
B{t) total number of births during the given period 
D(l) total number of deaths during the given period 
/(/) total number of immigrants 
E(t) total number of emigrants 

There are thus four factors (or constants) which affect the 
sixe of population, viz., births, deaths, immigration and emi¬ 
grants, of these deaths and births are ordinarily thr chief 
determinants of the change in population and heme a great 
emphasis on the study of Mortality anti Fertility. 

Miasma of Mortality . Mortality is not a single factor to 
be expressed as single number of index. The risk of death has 
to be measured in several aspects and as such var ious kinds of 
death rates are employed. 

Total Crtuh Death Rate A very common measure of decrease 
in population due to death is the crude death rate, It repre¬ 
sents deaths per thousand of the population Thr formula for 
calculating this rate is 

Numltrr of deaths in the population of a 
0 JXR.^* iv ™JS e ^ r *P*V? area during a give n year y j (>f , 0 
M id-year total population of the giv en 
geographic area during the given year 

Within broad limits, it gives the probability of dying of 
persons in the .imputation. 

Crude death rate is widely used as an index of mortality 
It is easy to compute and also to understand It requires ontv 
thr total number of deaths and the total mid-year |'opuho<«<n 



VITAL STATISTIC* 


591 


in a given period of time. It give* preliminary indication of 
the level of mortality. But crude death rate has its limitations 
too—particularly in inter-area comparison Mortality normally 
varies with age. If the age structures of the two populations 
are different, the comparison of crude death rate may be mis¬ 
leading. A population having a large number of old persons 
will show a higher crude death rate than the population having 
a large number of adults. Moreover, in calculating the crude 
death rates, some deaths may not have been registered. 

Crude death rate can be easily adopted for making com¬ 
parisons for the same area from year to year, because the com¬ 
position of the population is not likely to change much within 
a year. Hut for fitting a long-term tiend the effect of such 
change* should l>e allowed for. 

.1#e Specific Death Rates. Crude death rale simply reveals the 
average number of deaths pci 1,000 prisons including infants, 
children, adolescents, young ami old persons. It does not give 
an exact idea about the death rate in a particular section of 
the jtnpulaiinn. In order to study the mortality conditions for 
n particular section of the population, say, death rate among 
infants tinder our year of age, we have to study the deaths 
tKTuring among the infants. Such a specific study of death 
tatrs is known as age-specific death rates. In practice, insur¬ 
ance companies are interested in death rates in various age- 
groups nl the population. Particularly, age specific death rates 
afford a sound bad- h*r comparison between two populations. 
Here we shall siudi M'one of the types of age spec ific death 
rates. 

hijiiht Mvifnhly Rate In infant mortality rate, we study 
the death rates uud* r one year of age of the new ly born babies 
in a gisen pn md of time. Generally, the risk of death is greater 
during tine tir$i \ car of life than afterward*, and as such infant 
imutnlif) rate b highri than the crude death late. It is, there¬ 
fore, regained as one of the most sensitive indexes of health 
conditiom of the general population. The foi inula of its caku- 
1 at ion is ■ 



592 AN INTRODUCTION TO STATISTIC At MXTHODS 

Total number of deaths under one 
year of age which occured among 

Annual infant ^ jfejgE^jg" *L»1,000 

mortality rate Total number of live births which 
occurred among the population of 
the given geographic area during 
the same year. 

But the calculation of infant mortality rate may suffer from 
one limitations* vis,, babies who die soon after their births, may 
not be registered at all, as births or deaths. But even such, 
it has its uses for comparison purposes. When such a rate is 
computed separately for rural and urban areas, it w ill reveal 
clearly the uneven distribution of health facilities. The infant 
mortality rates may be computed separately for various geo¬ 
graphical sub-divisions of specific population groups in order 
to study the effect of factors as overcrowding, unemployment, 
social habits and customs, dietory habits etc. 

Mortality during Childhood\ just as we calculate the infant mor¬ 
tality rate, i.e., death rate among infants under one year of age, 
similarly we can calculate the death rate for children between 
1-4 years of age, $-15 years of age like that. Usually mortality 
rate during childhood is lower than the infant mortality rate. 

Maternal Mortality Rate or Mortality of Reproductive Ages Mater¬ 
nal mortality rates measures the risk of dying from causes asso¬ 
ciated with child-birth in the various age groups in the repro¬ 
ductive span of life 15-45 years of age. At these ages, death 
rates among women is generally higher than those of men. 
The formula is : 

Total number of deaths due to 
child birth among the female 
population of given geographic 
Annual maternal _ area dur ing a given yea r __ v j 

mortality rate Total number of the births which ' 

occurred among the population 
of the given geographic area 
during the same year. 

Mortality at Advanced Death rate for higher age 

group, for m years and above, is quite high, -perhaps greater 
than the infant mortality rate. Generally, for advanced ages 
only one death rate is calculated bO years or almr, or il may 
he 0$ years or above and there is no accurate classification of 
age like 00-15$, 05-70 like that As such this rate may have 



VITAL STATISTICS 


593 


link meaning for such a Urge interval, say 60 and above, »• 
very crude and has the limitations as that of die crude death 
rate for the population at all ages. Even then it gives some 
idea regarding the mortality factor among the elderly persons. 

For purpose of comparison, and to find out the trend 
of mortality conditions, it is better to consider age specific 
death rates rather than to study the crude death rate of the 
population at all ages. Taking the death rate of a particular 
year as the base, we can find out the general trend of the death 
rate in various sector of the population and then we can make 
comparison of the various regions or the various countries. 
Further for comparison of age specific death rates in the jmpu- 
lation of two regions, the age intervals should be uniform* 
Standard Death Rate 

It is evident from the foregoing discussion that the death 
rate in the same geographic area varies with different age 
groups. Likewise it varies with sex in the same age grouu. 
Since the crude death rate ignores the age and sex composition 
of the population a comparison of these rates for two geographic 
areas cannot give any correct idea about their respective 
salubrity. Any population containing many persons round 
about the age 10 to 25 where the rate is at its minimum, must 
have a lower crude rate than another population containing 
many infants or old people at which point death rate is 
relatively high. This may be so even if the age specific death 
rate for each age group is lower in latter case. This may be 
illustrated by the following example. 

TABLE 2 M 

District A ' V BistrJct $ 


tg 

<i 

■3 

«* 

1 

X 

« * 

C 

si 

h 

* O 

at ■£ 

i o 

a 

I t 

t£ 

■s 

1 

a 

d J 

0-15 

4,600 

SB 

. § . 

3,000 

30 

iff" 

10-25 

12,000 

48 

4 

20,000 

100 

5 

25 -M 

6.000 

66 

11' ' 

4,000 

48 

12 

60 & over 8.000 

158 

1975 

3,000 

60 

20 


Sfijociff ~ 

36T 

. w 


~ 2 38 

~~T3S' 


38 






S94 ah tnrnonvcnon to statistical methods 


A comparison of the two district* ihow that in every age 
group A ha* a lower death rate than ft. Vet it* crude death 
rate it higher than that of B. The fallacy is due to the fact 
that different age group* having varying death rates have 
unequal importance in the two population*. 

For purpose of comparing the death rate* in two geographic 
areas it is, therefore, essential that age and sea differences in 
the compositions of two populations should he eliminated. There 
are two methods of doing it, viz. t (l ) Direct, and (2) Indirect, 
and the rate obtained by each one of these methods is called 
the standardised Death Rate, 

Dnttt Mrlhfid Under this method the mortality rate* at 
each age group in the two geographical areas are applied to 
some common standard population. Thus wr would get a 
total death rate in that standard peculation if it were exposed 
to mortality rates of one area ; and another total death rate 
when it is exposed to mortality rates of another area. These 
total rates, railed standardised rates, ihow what would be the 
mortality rate* in each one of the two areas if they had popu¬ 
lation* which were similar in their age and sex distributions. 
These rate* have only one purfxjsr, he., comparison By them¬ 
selves they do not indicate anything and arc fictitious. 

If the age specific mortality rates of Town A and ft a* given 
in Table 21.1 are applied to the standard population as shown 
in Table 21.2 below the standardised Rate for town A and B 
would be computed a* given below : 

TABLE 21.2 


§tatuUrd ™ ! Town A Town B 

Population | 


Age group 

Population 

Mortality 

Rate 

Total 

deaths 

(Mortality 
; Rate 

Total 

deaths 

(MO. 

1,000 

.. <r. 

.&. 

. . .To. 

"iff 

10-25 

4,000 

4 

16 

5 

20 

25-60 

3,000 

1 ) 

33 

12 

36 

60 and over 

2.000 

1075 

39*5 

20 

40 

Aif ages iff.OOft 

Standardised 

Death Rate 

p*..xm _ 

T 

. 9? 5 

9 75 


766 "" 

106 



VITAL STATISTICS 


m 


In&rtrt MttM Ai has been teen earlier the dived method 
of standard sting the death rate* require* a knowledge of the 
mortality rate* of each age group in the population for which 
standardised death rate is needed (see table above). Sometimes 
thi* information is not available. In such instances indirect 
method of standardisation is spoiled. The first step m this 
method is the selection of a series of standard death rate* for 
each age group (see col. 2 table 21.3). Theae rates are then 
applied to the population at various age groups in the areas, 
the standardised rates of which is sought, to determine the 
number of deaths that would have occurred in each area if it 
had the standard mortality rates (see col. 4 and 6 of table 21*3). 
I t is thus found that if standard mortality rates are applied to 
the population of Town A and Town if respectively there would 
be 1,072 deaths in Town A, and 1J24 deaths in Town B t or 
rates of mortality at all ages would be 1675 and 19*73 per 
thousand respectively. These rates are called index rates for 
their level is an irwiex of the type of population from which 
they have been derived. Thus if a population has a large 
proportion of old persons and infants its index rate would be 
higher than that of a imputat ion composed of young persons. 
If the two rates are different (as they are in the illustration) 
it show* that the two populations are not identical so far as 
their age structure is concerned and hence if their crude death 
rates are to be compared some adjustment of them (crude 
rates) is necessary to allow for the difference in population 
type. These adjustments (called standardising factors) are 
determined by dividing the standard mortality rate for all age 
(15 in this case) by the index rates of each one of the two towns, 
vta M 15/16 75, i,e<, 801 for A town and 15/1873, i,e., 896 for B 
town. Now if the given crude rates for town A is 20 and for 
town B 22 the standardised death rates would as under : 

Town A 20x801^1602 

Town B ' 22 x *896 «19*7 



596 Alt INTRODUCTION TO STATISTICAL METHODS 
TABLE 21.3 


Town A Town I 

e ^ a 


Age group 

Standard 
Mortality Rate 

c 

.2 

» 

I 

Number of deaths 
that would occur 
at standard rate 

Population 

111 
i si 

Si? 

Under 2 

64 

5,000 

..42 

5,000 

320 

2 - 10 

7 

10,000 

70 

12,000 

84 

10 - 20 

4 

10,000 

40 

10, OCX) 

40 

20 - 60 

8 

32,500 

260 

25,000 

200 

60 and above 

60 

8,500 

310 

8,000 

480 


13 

64,000 

1,072 

60,000 

1,124 


According to this town A is healthier than town IL 
Ahaswrt ej FrtUhiy. To study the growth of the population 
ol a given geographic area in a given period of lime we have 
to take into consideration the number of live births that occur. 
As Barclay has put it, ‘fertility it an actual level of perfor¬ 
mance, based on the numbers of live births tluaf occur*. It is 
ascertained from the data collected by registration of births, 
and « measured at the frequency of births in a population. 
To calculate the frequency or speed by which population is 
increasing we use some fertility rates. Some of the important 
fertility rales are as follows : 

CrW# Both ^t/r. It gives the average number of births per 
1,000 persons in the population of a given area during a giver 
period of time. It is calculated as follows : 

Total number of live births which 
occurred among the population of 
a given geographic area during a 

* i,ooo 

Mid-year total population of the 
given geographic area during the 
same year 


Annual crude 
birth rite 






VITAL STATISTICS 


597 


Crude birth rate it similar to the crude death rate and 
suffers from the same limitations as that of »h* crude death 
rate. It is affected by several factors like age and sex structure 
of the population, marriage rate, migration etc. in order to 
make comparison of the crude birth rate between the two 
population, allowance should be made for differences tit age 
and sea distributions of the population. 

General Fertility Rate, Crude birth rate suffers from a great 
limitation inasmuch as, it relates the total number of live births 
to the total mid-year population. But in fact, the total number 
oi live births depends upon the proportion of women of the 
child-bearing age. To study the fertility rate as such we should 
relate the total number of li^e births to the total female popu¬ 
lation of the child-bearing age. The rate so calculated is 
known as general fertility rate. It is computed as follows : 

Total number of Jive births in 
a given population in a parti- 

y 1,000 


General fertility rate * 


_cujar year 

Total number of females in 
reproductive span of life (say 
15 49 yrs; in that very year 

In general fertility rate, in the denominator, we take only 
the number of potential mothers in the child-bearing age, and 
not the total population as in the crude birth rate. 

Age Specific Fertility Rate. Although general fertility rate is 
an improvement over crude birth rate, it gives oniy a general 
view of the fertility rate of the child bearing age group (15 to 
49 years) as a whole. For a detailed study we may calculate 
what arc known as age specific fertility rates for different 
child-bearing age groups. It is calculated as follows : 

Number of live births which occuired 
to females of a specified age group 
of the population of a given geogra¬ 
phic area during a given year I 
Mid-year female population of~the ‘ * 
specified age group in the given geo 
graphic area during the same year 

Age specific fertility rates are similar to age specific death 
rates. It reveals the distribution of the frequencies of births 


Annual age 
specific fertility ^ 
rate 



598 AR IlfTBODVCTIOIf TO STATISTICAL METHODS 


among the female population according to age. These rates 
afford a detailed analysis of fertility in a given population of 
a given period. 

Tiltl Futility Rat*. If we find out the sum of age specific 
fertility rates at each age-group interval from 15 to 48 years 
of age* it will give us the total fertility rate. This rate indt* 
cates, with fertility as it is, how many children will be horn 
per thousand women, arriving at child-bearing age provided 
none of these women dies before having passed the child* 
bearing age. 

Six Ratio at Birth. If we find out the ratio of total live 
births of males divided by total live births of females in the 
population of a given area in a particular period per 100, it is 
sex ratio at birth. It is computed as follows : 

Total number of live births of 
males in the population of a given 
Sax ratio ^ area during a given year ^ 

at birth** Total number of live births of 
females of the given area during 
the same year 

Gross Reproduction Rate 

‘Gross reproduction rate is the sum of the age specific 
fertility rates calculated from female births for each single 
year of child-bearing age/ In order to calculate the age 
specific fertility rate, we have to relate the females born by 
mothers of each specific year cf age or age groups to the 
total number of women of that age or the age group of a given 
geographic area. 

N umber of female live births to a 
specified age group of mothers 
of a given geographic area during 

Age specific _ ....a given year ___ x j ^ 

fertility rate Total mid-year female population * 
in that specific age group in the 
given geographic area during the 
same year 

Summing up these age specific fertility rates for all ages In 
the reproductive span of life, a measure of population growth 




VITAL ST ATISTICS 


m 


catted G. R. R. is obtained. U provides an upper limit of 
the rate of population growth indicating the average number 
of daughters that would be born to each group of 1,000 women 
beginning life together, if none died, before reaching end of 
the child-bearing period, and if they experienced the current 
rate of fertility. If G. R. R. is I, it indicates that the current 
generation of females of child-bearing age will maintain itself 
on the basis of current fertility rate but without mortality. 
But if it is less than one, then no amount of reduction of deaths 
will enable it to escape decline sooner or later. 

The computation of G, R. R. depends on the availability of 
the data, i.e., classification of births according to age of mother 
at the time of birth, and according to sea. But this two way 
classification rnay not always be available. In the absence of 
such data, we can approximately find out the value of G. R. R. 
by an alternative method— by multiplying total fertility rate 
by the proportion of births that were female on the assumption 
that sex ratio at birth, i.e., the ratio of the number of male 
births to the number of female birth* remained constant over 


alt ages of mothers. 


G. R. R, 


Total Fertility Rate 


Number of fe male births 
Total number of births 


Here total fertility rate is the total number of children that 
would ever be born to a given group of women, if the group 
pasted through its reproductive span of life with these birth 
rates at each year of age and if none died before reaching the 
end of the reproductive period of life. 

G. R, K provides the hypothetical upper limit of the rate 
of population growth, It recognises the current rate of 
fertility. It*s drawback i* that it ignores the current mortality. 
Some of the females who begin life together may die before 
reaching the upper limit of child-bearing age. But G. R. R. 
does not take this aspect of mortality into consideration, ft 
is assumed that out of a number of females beg in ing tile 
together none dies before reaching the end of the child-bearing 
age. And this drawback rs removed In the computation of 
net reproduction rate. 




600 aw immoDUcrtow to statistical methods 


Nit Reproduction Rati. Net reproduction rate indicates the 
average number of daughters that would be born to a group 
of women beginning their life together if they arc subject to 
the fertility and mortality rates throughout their reproductive 
span of life. It is computed by multiplying the age specific 
fertility rates of each age, by the survival factor of that age or 
age group. The sum of these specific fertility rates will be 
N\ R. R. Survival factor, i.e., the proportion of female survivors 
to that age is available from the life table, It is not similar 
to G, R. R. which takes into consideration the factor of 
fertility only, the number of daughters expected to a group of 
females if none of them died before completing the reproduc¬ 
tive span of life. N. K, R. uses the same age specific fertility 
rates, but it takes into consideration the survival factor taken 
from a life table. 

N. R. R.ra IT bx Jfcl 
15 J » 

where represents female births per person at each age x t 


Lx 

r; 


number of years lived at each age per woman born to 


the original group of females. 

49 

V ium of these rates for (he reproductive span of life taken 

15 

from 15 to 49 years of age. 

N. R R. cannot exceed G. R. R, because it takes the mor¬ 
tality factor into consideration. If it is one, it indicates that on 
the basis of current fertility and mortality rates, a group of 
newly born females will exactly replace itself in the next genera¬ 
tion, i.e , the tendency of the population to remain constant. 
It will show a tendency of increase or decrease in population if 
it is greater than one or less than one, as the case may be. 

However, both the gross production rate and the net 
reproduction rate should not be used for forecasting future 
-population changes. Firstly, because they do not take into 
consideration the factor of migration. Secondly, the rates 



VITAL STATISTICS 601 

of fertility And mortality are unlikely to be the same as at 

present. 

Life TmbUm 

A life table is just another and an effective way of expressing 
the death rates experienced by some particular population 
during a chosen period of time. It contains eight columns as 
shown below : 


LIFE TABLE* 


Age 

Living at 
age 

X 

5 & j -s’ 

ts < ~Z c 

s«fl s 5 

.£ O fi 2 u 

>. £ « 5 ! 

V 

u 

S V 

> a 

fc u 

•* «~* 

% &,r 
Sf c tj 

■s t § 

u 

tat 

u ca 

-S' v 
> > **♦ 

2 i 

ft 

fc 5 

*5 

9-1 £> 
ii 

X 

i. 

d. 

q* 

/>« 


r. 


0 

100,000 

1,710 *01710 

98290 

99,145 

4,086,420 

40-86 

1 

98/290 

1,592 01620 

•98380 

97,494 

3,987,275 

40 57 

2 

96,698 

1,483 01534 

‘#8466 

95,957 

3,889,781 

40-23 

3 

95/215 

1,383 01452 

98548 

94,523 

3,793,824 

3984 

4 

93,832 

1/293 *01378 

98622 

93,186 

3,699,301 

39*42 

95 

21 

9 

■40957 

*59043 

16 

32 

1-52 

96 

T2 

6 

42932 

57068 

9 

16 

1-34 

97 

6 

3 

44964 

*55036 

5 

7 

1-17 

98 

3 

2 

•47046 

52954 

2 

2 

067 

99 

1 

1 

49176 

50824 



— 


• Adapted from Life Table (1941*50) from the Paper No. 2 
of Census of India, 1951. 

The Csastmtisa «f Life Table 


The basis of the table is the value commonly known as q* 
which is the probability of dying between any age X and age 
X f 1 where X can have any value between aero and the 
longest duration of life. For example, q (JO) is the probability 






602 AW INTRODUCTION TO STATISTIC AC METHODS 


that a person who has reached hi# tenth birthday would die 
before completing his eleventh year. Thc#e probabilities one 
for each year of age, arc calculated from the death ratca 
experienced by the population in a particular year, The 
probabilities are mated in the column headed ?* in the table 
given above. Once these values are known, the construction of 
the life table ts a simple process The next step is to assume 
an arbitrary number say 100,000 at age aero. By relating the 
probability of dying before the attainment of first birthday we 
find the number who die in the first year of life. This number 
of total deaths in this group of 100,000 infants before they 
attain their first birthday would be written in the d m column. 
By subtracting these deaths from the initial group of 100,000 
we have the number of survivors at age one. For these 
survivors at age one we know the probability of dying between 
age one and age two. By relating this probability to those 
who have attained the agt one, the number of deaths between 
age one and age two can be calculated. Subtracting these 
deaths from the group which survived first year we shall get 
the survivors at the end of second year This process would be 
repeated till all are dead.* 

This process has clearly indicated that if q t values arc 
known to ui, the l M and d r columns can be easily constructed. 

On the basis of the above discussion we are in a position to 
explain more fully the contents of the first four columns 
of the cable. 

The fint column (called V) gives years of age. 

The second column (/*) gives the number of persons sur¬ 
viving at each successive age starting out together at birth, 
Thus the number 100,000 in the /, column against *0* year age 
indicates the number that began their life together and are 
funning the first year of their life. T he figure 98,290 against 
1 year age indicates the number who have completed first year 
of their age and are running the second. Likewise 96,698 

♦ It may, however, be stated here that in the absence of adequate mortality 
data the construction of p coiutnu may be obtained it «bc number of 
permit living ai successive ages are known. 



vital mnsnci 603 

shows the number that completed second year and running the 
third. 

The third column (d 0 ) indicates the number of deaths in 
the year x. Thus 1,710 are the number of deaths in the group 
before it attains one year of age ; 1,592 deaths among those 
who have attained one year of age but have not completed 
their second year. 

The fourth column (f g ) gives the mortality rates to which 
the population group would be exposed throughout their lives. 

The hfth column ( p M ) gives the probability of living from 
one age to the next, he,, from age x to age x4 1* Since the 
individuals must either live or die in a particular year ol life 
q, f p* I p my therefore, equals 1 - y*. 

T he sixth column (L w ) gives years of life lived by the group 
between the ages x and *4-1. 7 T hi* means that if a group of 
100,000 infants began life together and if 1,710 dir during the 
first year of their lives the L„ would be 100,000 } X 1,710 
^ 99,145. /., would be 96,290 $ x 1,592 97,494. This is 
based on the assumption that deaths at each year of age arc 
evenly spread throughout the year. 

The seventh column (7*.) gives the total number of years 
lived by the group from the age a until all of them die. 

7 m L g -f' L* « i 4 L+m *t * - f'* 

or 7,.»“ / , 4 l A 4 L t . 

ix , 7*«99,1454-97,494 t 95,957.94 54 2 

4,086,420 

The eighth column (*%) gives the expectation of life at 
age x, i.e. T the average number of years still remaining to a 
person at age * before his death. 

T 

. For age 0, it is 40 86 ; at age 1 it is 40*57. 

•* 

The life table provides the following informations : 

1 . If a certain number of individuals born on the same 
day be exposed to mortality rates at different ages prevalent in 
a particular year, in what gradation would they disappear 7 

2* How many would still be alive at any given age ? 




604 AN INTRODUCTION TO STATISTICAL METHODS 

3. How many would die between any two age*, say, age 
13 and age 25 ? 

4. What would be the probability of an individual survi¬ 
ving from an age 50 to 55 ? 

5* What would be the average life of the group ? 

6 . What is the number of years that an individual may be 
expected to live at a certain age ? 

The insurance companies use these regularly revised life 
tables to determine the cost of life policies. 



VITAL STATISTICS 


*05 

* EXERCISES 

Calculate che crude and standardised death rate* erf the local population 
from the following data and compare them with the crude death rate 
of the standard population. What inference do you draw from the 
comparison ? 


Age group 

Standard 

Population 

Population 
I>»ths ,, y 

Local 

Population 

Population 

Deaths 

0-10 

600 

18 

400 

in 

10-20 

1,000 

5 

1,500 

6 

2CM*) 

3,000 

24 

2,400 

24 

no-too 

400 

2ft 

700 

21 

Total 

5.000 

67 

5.000 

67 


na.2) 

2, Construct an example 10 show him an incorrect, picture can be pmrn* 
ird hy crude death rate* when we employ them for comparing the 
salubriousnei* of two plarei. Show how thr standardised death rates 
can be used to avoid such pitfalls 

IIM) 

3 The following dais gives the number of women *»f chilli-bearing ages 
artd yearly birth by quimjirfbimal age groups for a city. Calculate the 
general fertility rate and total fertility rate, If the ratio of male to 
female children it 13, 12, what it the gross reproduction rate ? 


Age group 
Female popu* 

15-19 

20-21 

25-29 

30.34 

35-39 

40-44 

45-49 

la tion 

If* 

15 

H 

n 

12 

11 

9 

Births 

400 

L*m 

2 l is) 

1,430 

96ft 

330 

3$ 

MM1 

Compute the 
below: 

gross and net 

r eprndun »o#» rate* from 

thr data 

given 

Age groups 
female Popu 
la lion itJ00\ 

15-19 

20.24 

25-29 

30-34 

35-39 

40-44 

45-49 

1,558 

1,112 

1,595 

1,629 

1,627 

1,522 

1,401 

Female births 18,900 

71.100 

% t 900 

84,200 

34,900 

10,800 

800 

Survival rate 

0 914 

0-899 

ft WM 

0 -868 

0-852 

0814 

0-913 


JIS.5J 


from the following dam calculate the gross and net reproduction rate* 
of females, ('an you also rakuUic similar rale* for males ? 



Population 

Children born 

Survival rate to the 

Age 

in 

000’s 

to female* 

middle of 

age group 
Female 

Male 

Female 

Male 

Female 

Mule 


J.P*) 

W' 

: 9m: 

(Jl/j 

<Sm) 

(V) 

15-20 

10-3 

10 

312 

300 

0-902 

0*90 

20-25 

9*4 

9 

692 

630 

MM! 

099 

25-30 

8 2 * 

8 

477 

490 

0-879 

ana 

30-35 

7 1 

7 

293 

m 

0-87 J 

0*87 

35-40 

59 

6 

160 

ISO 

0 853 

0*93 

40-45 

4-9 

3 

32 

35 

MSI 

0*81 


fy With the help of an example show that while calculating total fertility 
rates if the number of women at each year of age is decreasing and 
fertility rate it increasing, the result obtained from data expressed in 
groups of five year* is lower than the actual value and if fertility rate is 
decreasing the result is higher than the actual one. 

tlMl 



606 AH IHTSODUCTJOH TO STATISTICAL METHODS 

7, Fallowing figure* give the mortality rate of the life table Wt at 
age* 0, 1, 2,3, 4, 5 taken from the life Table of makr {All India) 
liSK Taking the radii* aa 10,000 calculate /*, /**, /,» value* 

the various age* : 

02407, 0*0918* 0 0564, 0*0592, 0 0274, and 0*0193. 

nwj 

8> Fill )n the blanks of the following skeleton table which are marked 
with question marks. 

Age * / d P t? I T r* 

20 693.435 2,762 ? ? ? 35,081,126 ? 

21 000,673 - - — — ? > 

r 5B.9j 

9. The f* column of a certain life table for age* 0 f l t 2, 3,.100 is given 

by serin 100, 99, 90, 07,,.0. Calculate the following values: 

#‘0, 75* U. #10, T20, /2*» and ^?0 

fiO.IO] 


m 




Chapter 22 
Statistical Quality Control 


S tatistical quality control is one of the most useful and eco¬ 
nomically important applications of the theory of sampling 
aml significance in the industrial field. An important feature of 
modern industry W repetitive work turning out a large number 
of presumably *' identical products”. But no two piece* follow* 
‘mg one another oft* the same machine are identical in 
measurable characteristic* in spite of all the precision of modern 
engineering. The variation mav be infinitesimal but it doe* 
exist. Because of ihr inevitability of the occurrences of vari¬ 
ation* the users of the products (assembly plants) set standards 
of quality to which products must confirm if they arc to be 
considered satisfactory for use. These standard* specify not 
only a basic norm but also the upper and lower limit* within 
which a product will be considered satisfactory. These limits 
are called tolerances and represent the allowable variation in 
the measurable characteristic of the product so far as its use is 
concerned. If the sij?e of the product falls outside the range 
of these specifications it is considered unfit for use. This being 
so a manufacturer is faced with the problem of ensuring that 
hi* products are of requisite quality and the sire of the tharac* 
tcttftks in terms of which their quality is measured does not 
fall outside the maximum and minimum tolerances stipulated 
by the assembly plants. One method usually employed to 
ensure that defective products are not passed into stock from 
the factory is to have a 100 per cent inspection system that is 
to say that each unit of product is inspected to assess its quality. 
But this system has some serious defects. The defective work 
k detected only after it has been completed, and as such if 
several processes have been carried out after the product be* 
came faulty a considerable amount of waste takes place* Again 




608 AN INTBODUCTIOPf TO STATISTICAL METHODS 


human nature being what it is, even a hundred per cent ins* 
pection is no guarantee about the quality of the product passed 
into stock. Betides, the cost of such inspection may often be 
formidable. A suitable system of inspection would be that 
which (i) detects the defect as soon as it occurs, i.e., at its origin, 
and (ti) introduces a continuous sample inspection in place of 
100 per cent inspection, The technique of quality control 
provides such a system. Thus statistical quality control is a 
system which consists of (i) sampling inspection of manufactured 
products at each stage of its production, and (ii) statistical 
inference regarding the variability of its quality with the help of 
dm pie devices like charts etc. This technique was or iginated in 
the work of Walter A. Schrw born 1891.) and it was during 
World War II that its greatest development took place. 

There are two aspects of statist teal quality control, vis.., 
(i) process control, and fsi) product or lot control, also called 
acceptance inspection. The aim of the first is to evaluate the 
performance of each individual process and thus to foreser the 
variability in the quality of output of each process in the 
immediate future. The aim of the second is to see that a lot 
put in the market does not contain a Urge number of unsatis¬ 
factory units. 

Process Control 

The variations that occur in a production process may be 
attributed to two main causes : 

(1) Random or Chance Causes . These are very many in number, 
each one exercising only a trivial effect on the quality of the 
product. These causes are inherent in a production process »n 
the sense that they will continue to operate and cannot be 
removed completely. Small variations in the skill of manual 
operators or in the quality of raw materials are causes which 
come under this category. The total effect of all these causes 
on the quality of the output is so small and insignificant that it 
may be ignored. In any case since it it caused by innumerable 
forces any effort to eliminate them is likely to be uneconomic 
and may even prove a failure. 

(2) Assignable Causes. These Are causes which can be 



WATIOTICAt- QUALITY COrfTHOL 


609 


identified and are responsible for important and larger variation 
in the quality of the product. These causes normally interfere 
with the economic working of the plant and it is economically 
necessary to eliminate them at soon as they are discovered, 
Excessive wear on the cutting tool, mechanical fault in plant, 
bad handling of the machine by the operator are a few 
examples of such causes. 

If the variations in the quality of particular product are 
such that they can in entirety be attributed to chance carnet 
alone (that is to say if the variation is such as would occur in 
random sampling from stable population) the process is said to 
be in a state of statistical control. If this is the cate, the 
variability of the quality of product cannot be altered unless 
the production process itself is altered and as such prediction 
about the future behaviour of the data can be made If, 
however, it is said that a process is out of statistical control it 
implies that certain assignable causes are present affecting the 
variability of quality. The object of statistical quality control 
is to detect such assignable causes as soon as they occur in the 
production process. 

From what has been said before it follows that the technique 
of process control implies (i) determination of the way in which 
variations in quality would be distributed when the process is 
tinder control, and (it) checking on a continuous basis 
whether the variability in quality confirms to this distribution 
or not. If the variations do not fall within this distribution it 
is a warning that the process is going out of control and reme¬ 
dial measures should be adopted. 

Now if production is influenced try random causes only the 
various units produced constitute a single homogenous popu¬ 
lation and the variations in its size can be described by 
probability distributions. Thus if the mean value and standard 
deviation of a certain measurable quality characteristic X (say 
the length of a screw) are X and o respectively when the 
process is in control, then if samples of V size are taken the 
sample mean * will be approximately normally distributed 

» 



616 4if iirmomjtmon to arATurrscai, methods 


itiottl« mesa JP with a standard crror^~». This means that 
approximately 1 »ut of 20 samples drawn from a population 

- 9 

will Usse means lying outside X ± l *96^p : , and 1 out of 100 

will have then lying outside the limits yi-2'5758e. Very 
rarely a sample mean would lie farther away from X than by 
& 


V*~ 


# jim due to chance. This means that if we get a sample 


mean further from X than 3 — 777 we should suspect there is 

Vs* 


some assignable cause present which accounts for it. 

This checking up of quality characteristic under control it 
done with the control charts. 


Control Charts 

As stated above the main tool of process control is the 
control chart. There are different types of control charts, 
depending on different ways of assessing the quality of a 
product. Many characteristics are measurable* such as length 
of a screw, tensile strength of a yarn, resistance of a wire, 
life of an electric bulb etc. Such variables are continuous and 
the frequency distribution of these, when process has only 
chance variations, is normal distribution. For controlling such 
qualities two types of charts are plotted—one for mean of the 
measurements (Dehart) and another for the range of the 
measurements (/1-charts) 

Sometimes the characteristic representing the quality of a 
product is discrete, such as thread count of a piece of cloth, 
number of surface defects on a polished surface etc. In such 
cases the number of defects on an item may be nil, one, two or 
more. The total number of defects may be very large, but it 

will always be an integral value and never a fraction like 2‘7 
defects etc. In such a case the distribution explaining the 
number of items according to number of defeers on it, when 
process la in control will be a Poisson distribution, tinder such 



mAtmtckh quality contool 611 

* circumstance control chartt for average number of defect* per 
item (Cehart) is plotted. 

There is still a third way in which quality may be measured. 
Many time* i$ may be possible only to classify a produced stem 
as good or bad, e g., a container produced may have a leek. 
Whether the leek is at one place or at more than one places, 
the item produced is useless frt such a case to assess quality 
it is only possible to determine the proportion of bad or defec¬ 
tive items in a sample. 7’he distribution explaining the chance 
variations in the proportion of defectives is a binomial distribu¬ 
tion provided the sample selected from the lot is relatively very 
small In such a case control chart for proportion of defectives 
(/►-chart) nr number of defectives in a sample of fixed fixe it 
prepared. 

All i>pes of control charts (vie, > 1 X\ */?\ *C* or *p* charts) 
are similar in composition and construction. All of them 
represent how quality characteristic is changing its value from 
one sample to another sample. The various stages in their 
constructions are : (i) Take sample number against .V-axis and 
the quality characteristic, i.e. f IF. /?, p or C along Y-axis. 

(2) Mark the central line corresponding to the average 
value or the target value of the quality characteristic, i.e , what 
the process is capable achieving in terms of quality 
characteristic. 

(3) Plot the upper and lower tolerance limits. The*e 
limits point out to the values of qualities beyond which products 
will not be accepted in the market. 

(4) Calculate the control limits for the quality and mark 
them. These limits point out to the possible variations due to 
chance. Control limits should always be within tolerance 
limits. Calculation of these limits is explained in a later sect ton* 

(5) Select samples at fixed intervals of time from the items 
produced. Assess the characteristic representing the quality of 
the product. Mot their values against the various sample 
numbers. If the value plotted is X, the chart is called XcHait. 
If the value plotted is y (proportion of defectives in the 
sample) the chart is called /Mrhart etc. 



612 AH fHTftODlTCTlOM TO STATISTICAL METHODS 

A process is considered out of control and an action to 
cheek and correct the process is taken when 

(1) a plotted point fall? outside the control limits; 

(2) several points lie dose to the control limits ; and 

(3) there is unusual non •random arrangements of points. 

These control charts have the following advantages r 

(1) They provide visual aids, (2} are easy to prepare, (%) are 
a simple device of checking whether variations conform to 
chance variations or are more titan that, (43 give early warning 
<d a trouble, and (a) they are flexible, Front the unusual 
arrangement of points an alert prison might suspect the in¬ 
coming trouble at quite an early stage. 

Calculation* of Control Limit* 

In drawing conclusions with the help of control charts two 
types <4 errors con he made ri) concluding that the process is 
out of control when actually it is in control, and (ii» concluding 
that process is in control when it is actually out of control. The 
control limits are so set that there is some kind of economic 
balance between these two errors. Thcv are so set that the 
engineer is in a position to detect serious troubles and utav not 
waste tune on minor ones. It has been proved mathematically 
and found by observation a iso that even if the distribution of 
items produced is not very close to normal distribution, the 
distr ibution of variations in the* arithmetic means of the measure- 
ments taken from random samples drawn from various lots of 
production, when piocrss is in control, follows normal distribu¬ 
tion very closely. In a normal distribution it is only 3 in 1.000 
chance that a value lie outside the range A.M. p 3 St, Devia¬ 
tion* With these limits as control limits it vvill be highly tfnpro* 
hablr that a plotted point may be outside the control limit 
provided the process h in control. Thus if a plotted point is 
outside these limits • railed 3 sigma limits), it is a warning that 
variations are perhaps not due to chance hut due to some 
serious troubles. So control limits in all the charts are usually 
plotted at 3 sigma limits. 

In T charts these bruits will be general average 4- 3x St. 



STATISTICAL QUALITY CONTROL 013 

Error of 7! or A. mean. Symbolically they may be represented 

a*y±3 v/ ®, , where 3P is the standard value or the target 

value. It is generally found at the start by taking the A. mean 
of the mean values of the hr si 20 or 25 samples when the pro¬ 
cess is in control* *JV' in the formula is sue, of sample selected 
and V is the standard deviation of the whole production 
(population) In usual problems of testing of hypothesis o is 
estimated from the standard deviations of the sample. But in 
plotting control chart cr is usually estimated from the average 
range of the first 20 or 25 samples. This estimation is usually 
done with the help of standard tables called A t tables which 
directly give the value of 3<s/v / « an the ratio of range for 
different size of samples. This is done to save time in 
calculations and reducing the technique of plotting the chart 
to a very simple procedure so that a person with ordinary 
ability may easily prepare them. Similarly, in case of /Dehart, 
control limits are set at average range - 3 x Sr Error of Range. 
These values are estimated from standard tables called fJ 3 and 
D t tables. These tables give the values of lower ami upper 
control limit* as a ratio of range for different si/e of samples. 


A Sh Djf and I) 4 Tables* 


n 

Sue of sample 

*. 1 

_ i- 

d 9 

d. 

2 

1 #81 i 

0 

3267 

3 

1 023 ; 

0 

2 575 

4 

0 7285 ; 

0 

2 282 

5 

05768 : 

0 

2 115 

6 

04833 j 

0 

2-004 

7 

0 4193 ! 

0076 | 

1*924 

8 

0-3726 1 

0 136 ! 

1864 

9 

0 3367 

0*184 ! 

S 1*816 

10 

0S0B2 i 

0223 

1777 

11 | 

02851 J 

0 256 S 

1744 

12 j 

0*2658 i 

0 284 

1 716 


When plotting 7-charu and tf-chari usually small 
samples of size upto 12 are selected out of lots of approxima tely 


* Reproduced from* Tkt P factual and Bxaituu StatiftU? by Croautti and 
Cowdeft f Third Edition. 





614 AW IWTBO00CT1OW TO 8TATI8TICAI* METHODS 

100 Hems produced. This if only because of practical con¬ 
venience* 

For estimating control limits for p chart (proportion of 
defectives), first average value of p, i.c. p is estimated from first 
20 to 25 samples. This p gives the central line. From Bino¬ 
mial distribution we know standard error of p is \/ px'qjn 
whets f «* I **p and « is the size of sample. So the control limits 
for p chart are p±Z\^'p q n . If the control chart of number 
of defectives is plotted, standard error would be V n p q and 
hence the control limits would l>c number defectives 4 n p q„ 

Similarly for C chart, first average number of defects per 
item {£*) are estimated. Theory of Poisson distribution gives 

Standard error of G asy' G * So the control limits are C±3\/£• 

Examples ot Different Charts 

Exempt* i : 

X and R Charts, The nominal dimension of a component is 
0d350* with a tolerance of ± 0*0032 \ The data given below 
are in 0*0001 units above and below the nominal dimension. 
Blot the control chart for average and range : 


Observation 

No. 


No. of Samples 




1 

2 

3 4 5 6 

7 

8 

9 

10 

1 5 

0 ~ 

15 —15 ~ 5 10 

10 

-5 

5 

20 

2 23 

10 - 

10 0 5 - 15 

15 

-5 

5 

-10 

3 - 25 

0 

0 -25 5 - 10 

5 

5 

.5 

0 

4 0 

5 - 

1 

o 

7 

o 

*/■> 

i 

o 

10 - 

-15 

-5 

—5 

5 15 

0 

0 5 -10 5 - 

15 

10 

10 

5 

Solution : 







Mean and 

range 

of 5 observations for 

various samples in 

units of 0*0001 inch are given as : 





Sample No, l 

2 

3 4 5 6 

7 

8 

9 

10 

Mean (.?) 4 

3 

-7 -8 -l -4 

1 

-2 

2 

2 

Range (R) 50 

10 

15 30 15 25 

30 

25 

15 

30 


Tolerance limit* are 0*to <M3IS*. 




VTATMTlCAli Q0A14TY CONTROL 6U 

s» JTJT —. 10 

Proem average (X) •-- y •—j—-■►—l, L*., 0*1*30—1 x 

000!-01349 Inch. 

ro ]xj 

Mean Range (*) —"|y —-jy —24*5, Le. ( 24'3x OOOI 

• 00243 inch. 

A, value for a sample of size 5 from the tabto~0’S769 

■ A ' — r~ 

■*. * 

••00245 X 5768 

• 0014 

Control limit for jFchirt»f± -^•»0*J$49±*0014 

or 0 1363* to 0*1$$5 # . 

D* and Z) 4 value* for tarapie of fice 5 are 0 and 2'115. 

Lower control limit for £chart*»/)*X Jl—0 
and upper control limit for R Chart—Z) t X $—2*1 15 x 

00245-0052' 

Now to plot the control chart take the value* of X and % 
along Y axis and sample number along X axis Here the target 
value 0*1350* and the proceit average Chi349 are different* 
Thi* means machine has been *et fo» 0*1350' but actually it i* 
capable of producing average tize of O' 1349". So central tine 
will be marked at O' 1349*. If actual target value O' 1350* i* to 
be achieved, tome correction must be made in the process. 
Plotting thcK value* control chart* are as thown on page 616. 




Simple No. I 2 1 4 5 6 1 S W 10 

Sice 200 200 225 75 190 210 500 212 200 188 

No, of 


defective* 20 19 23 8 18 22 51 21 19 19 


Drew the control chart for number of defective*. 





















STATISTICAL QUALITY CONTROL 


611 


S elution : 


In the present case size of sample is not the same. Control 
limits depend on size of sample (n). So this means for each 
sample there would be separate control limits But if variation 
in size of sample is not large, the average size of sample is 
usually taken for calculating the control limits. In the present 
example except for sample Nos. 4 and 7 all other samples are 
approximately of the same size. So separate control limits 
may be calculated for samples 4 and 7 and for others average 
size of the sample may be used. 

Average size of sample (except sample Nos. 4 and 7) ~, 

I ^25 

~ — --203*1 The fraction of defectives is more or less same in 
o 


all cases. 


So p 


Total defectives in ail the ten samples 220 


-Ok 


Total items in ail the ten samples 2,200 
Standard error of number of defectives for average size 
iu: V /*T 9 ~ V 203 l x l x'*9 «4*2H. 

The average number of defectives for Q samples leaving 

220 - 59 


4 and 7 


8 


20'1. i bis gives the central line. 


3 sigma control limits for average size of sample 
« 20*1 ±3x4-28* 7'3 to 33 9 

For sample Nos. 4 and 7 due to difference in size, 
the average number of defectives, be,, central line, cannot 
beat20i. The position of their central lines can be found 
by multiplying the size of the sample and proportion of 
defectives, i.c., p. So for number 4 central line will be at 75 x *1* 
i.e, 7*5 and for number 7 it will be at 500 X *1, i,e., 50. Similarly, 
Standard error for number 4** V 75 X * 1 X 9 » 2 ‘598 and No, 7 
will be v 500 >T*Tx^9^r6‘32. 

Therefore, control limit for No, 4 iample«7*5±3x 2*598, 
Le, f 15*3 and 0, and control limit for No. 7 sample —50 ±3 x 
6 32, he., 69 and 3 k 



618 aw immomctton to statistical methods 
Plotting these values, control chart is as shown below : 


control 

PO* NUM&EP Of 0£?£cr/y£& 



SAMPLE NUMBER 


Fig. 22.2 


E*<mpU 3 : 

C-Chart. Following 20 figures correspond to the number of 
defects (spots) in 20 sheets of I'x2‘ selected as 20 samples 
during a process of manufacturing metal sheets* Plot a suitable 
control chart. 

27 , 26 , 20 , 19 , 12 , 16 , 22 , 29 , 13 , 17 , 22 , 26 * 25 , 18 , 17 , 13 , 19 , 15,1 8 , 22 . 

Solution : 


Total number of defects in 20 samples ^396 

396 

Average number of defects per sheet (C) 19*8 


Standard error of number of defects — C** V 19*8 »*4'45. 

\ 3 sigma limits for C-chart—C ±S\ J C «19'8±3 X 4*45. 

—33*15 to 6 45. 










STATISTICAL QUALITY CORTltOL 619 

To plot the control chart plot the central line at 19*8 defect* 
mid control limits at 33 15 to 6*45, The chart is as below : 


CCHTfiCi CHAM fM 

Nvmtft of oincT$ n* n*re 

(C- C»*#T} 



Fig, 22.3 

Mu. In example numbers 2 and 3 tolerance limits art not 
plotted as the figures for them arc not available. 

Product Control 

Another aspect of the statistical quality control is the pro¬ 
duct control, The object of this is to decide whether to accept 
or reject a lot produced for marketing purpose. This is also 
done by sampling inspection. First a sample of small size is 
taken from the lot. It is usually the same sample which is 
used for process control. If the quality of the product at 
assessed by this sample is very good, the lot is accepted for 
marketing. II it is very bad the Jot is rejected, II no clear- 
cut conclusion can be drawn a second sample is selected front 
the lot and the lot is accepted or rejected on the basis ol two 
samples combined. Sometimes more samples may have to be 
taken for a clear cut decision. The plans for selecting samples 
depends on the following objects : 







620 AW IWTHODDCTIOfV TO STATISTICAL METHODS 


(1) The producer’* risk, he., chance of rejecting a good lot 
is small (a specified value). 

(2) The consumers’ risk, he., chance of accepting a bad lot 
is small (another specified value). 

(3) The average quality of goods sent out of the factory 
may not be worse than some specification. 

(4) The amount of inspection be minimum. 

If the process is in control, (lie variability in quality will be 
small and hence the extent of inspection for accepting or reject* 
ing the lot will be very small. If the process is in control, 
the variability in quality would be less and as such the fre¬ 
quency of inspection for inspecting or rejecting the lot would 
be very small. But if the process is out of control, i.e., the 
variability in quality is large, inspection would be more fre¬ 
quently needed as some items may lie outside the tolerance 
limits. 

Advantage* of Statistical Quality Control 

In conclusion the advantages may be summarised as below : 

(1) It is a technique which provides a continuous inspec¬ 
tion of the product at various stages of the manufactur¬ 
ing process. 

(2) It eliminates the need of 100 per cent inspection of the 
finished product and is usually more efficient and less 
costly than 100 per cent inspection. 

(3) It reduces waste of time and material to absolute mini¬ 
mum by giving an early warning about the occurrence 
of defects, 

(4) The technique is quite simple and can be operated by 
semi-skilled operators, 

(5) Rejection by buyers are almost reduced to nil* 

(6) Savings in terms of the factors stated above means less 
cost of production and hence may ultimately lead to 
more profits. 



STATISTICAL QUALITY CONTROL 

EXERCISES 


621 


I, The following data give reading* for 10 sample* of fire ♦> m the prod tie- 
thm of a certain component : 


.Sample 

1 

2 

3 

4 

5 

a 

■5 

H 

9 

HI 

Mean [£) 

m 

508 

505 

582 

557 

337 

511 

614 

707 

753 

S. D, u t) 

30*5 

4 Hi 

39 5 

ni 

27*4 

24-2 

4b 7 

ft 9 

1TI 

33*9 

Range (ft) 

9b 

128 

100 

91 

m 

itj 

\m 

28 

37 

80 


Draw the control charts for A\ n and R, calculating the limits for X in 
two ways. Can the within group variability be regarded as homo- 
genrmis ? Can one resume that all groups are from homogeneous lot ? 

fiLiyj 

2. A vet age proportion of defectives in first ten sample* of si/*- 150 each was 
observed tf» be <>'0C What are the 1 in 1 000 i ontroi limits 7 Jflarcr 
on it is noticed that the machine h producing imlv 2% defective item#, 
what are the revised control limits ? 

\ 11'14] 

1 l i was foimd that when a manufar hiring process is under control, the 
number m( defer lives pm sample hatrh of h> is 1'2 What limits would 
you vu pi n quality evinfrol ch.ut based on the examination of defect o rs 
in ;Amp(e batches of !0 (Hhlhj 

■\. In a plan# making about 9 V > irucks per day number of defects at the 
end of assembly line is noted. Average number <d defects per truck fur 
7 weeks ate l bH, 2'), 2'b‘b 2 »id, 2 1,2'0 ami 17. kiwi Out the figures 
fm central line and com ioj limitt for the total number of defects C.' 
observed in one day’s production and for daily average* iC\ h*r murdier 
o! <Mer rt per truck 

f HUH) 




Chapter 23 

Indian Statistics 


Historical Background 

I ndian statistics date back to an ancient period. Kautilya's 
Arthashastra as also Ain+t-Akbari provide a lot of statistical 
information regarding population, wages and prices of the 
time* to which they refer. But the modern history may he 
said to commence from the year 1895 when a ‘Statistical 
Bureau’ headed by a Director Genera! of Statistics was set up. 
The establishment of this ‘bureau' in fact marks the real 
beginning of the Department of Commercial Intelligence and 
Statistics—a department which till recently was the chief 
agency by which most of the statistical material pertaining to 
India was compiled and published. 

Before this date statistics pertaining to agriculture were 
collected by departments of agriculture that were created in 
1881 as a result of the recommendations of the Famine 
Gommisson of 1880. The statistics of foreign trade, prices 
and industries, on the other hand, were being collected and 
published by the Departments of Finance and Commerce. The 
‘Statistical Bureau' was now to deal with the statistics of both 
these departments and as such it can l>e said that the creation 
of this bureau was the first attempt towards the co-ordination of 
statistical data. This department was merged with the newly 
created department of Commercial Intelligence in 1905, to be 
separated again in 1914 as a consequence of the shifting of the 
headquarters of the Government of India from Calcutta to 
Delhi. This separation could not be allowed to continue in 
view' of the financial considerations and the departments were 
amalgamated again in 1922. This combined department was 




INDIAN STATISTICS 623 

called as the office of the Director General of Commercial 
Intelligence and Statistics* 

Another important landmark in the history of Indian 
statistics »s the appointment of the ‘Economic Inquiry Commit¬ 
tee* in 1925 under the chairmanship of Sir M. Visveswarayya. 
This committee was set up to inquire into the question of the 
statistical data available and the desirability and possibility of 
supplementing it and of undertaking an economic inquiry. 
The committee in its report gave a comprehensive survey of 
the statistical material that was then available in India and 
suggested that if Indian statistics were to be maintained on a 
satisfactory basis statistics of all departments* both provincial 
and central, should come under the supervision of a Central 
Authority It also suggested that an official Year Book on the 
lines of Dominion Year Books should be compiled. Due partly 
to the opposition of some of the Provincial Governments and 
partly to the recommendation of the Royal Commission on 
Agriculture, no action was taken on the recommendations of 
this committee. The only event of importance during this 
period was the establishment of a 'Statistical Research Bureau* 
in 1933 with a view to provide an organisation for the continu¬ 
ous analysis and interpretation of economic statistics. 

The neat important step in the history of statistical 
development in India w^a* the Bowley-Rohertson Inquiry, in 
1934, into the possibility of an economic census in India. The 
committee gave detailed suggestions in regard to a census of 
production and the measurement of the National Income. It 
also recommended the compilation of a Guide to Current 
Official Statistics for the centre and for the provinces. These 
two experts also recommended the appointment of a perma¬ 
nent economic staff including a Director of Statistics at the 
centre. This organisation was to conduct the census of popu¬ 
lation, the census of production, and was also to co-ordinate alt 
central and provincial statistics As a result of these recom¬ 
mendations the Government of India decided to set up a Central 
Statistical Organisation. But this could not be done owing to 
financial stringency. An office of the Economic Adviser, 



624 Alt INTRODUCTION TO STATISTICAL METHODS 

however, was treated in 1938 and the Statistical Research 
Bureau was merged with this office. The World War II came 
in 1959, The position in India regarding the availability of 
statistical material at this time can be described in the words 
of the Bowley-Robertson Committee as follows : “The 
statistics in India have largely originated as a bye-product of 
administrative activities, such ns the collection of land reve¬ 
nues, or from the need of information relating to emergencies 
such as famines. Only in the case of population census and 
to some extent of foreign trade has there been an organisation 
whose primary duty is the collection of information As a 
result the statistics are uncoordinated and issued in various 
forms by separate Departments/* 

Tire outbreak of the war in 1939 and the consequent 
assumption by the Government of India of a two-fold respon¬ 
sibility for the execution of actual military operations, including 
the procurement of supplies for the forces and the co-ordination 
and rationing of civil needs, revealed the inadequacy of the 
then existing statistics in India. The various Departments of 
the Government of India began to feel the need for more and 
more statistics for the effective discharge of their duties. Sta¬ 
tistics thus could not be allowed to continue as a bye-product 
of administrative activities and had to lie collected by specia¬ 
lised agencies for helping the administrative machinery itself 
For example, the Department of Industries and Civil Supplies, 
which was created for the purpose, of regulating supplies of 
various articles for civilian consumption, had to compile statis¬ 
tical information relating to essential articles in connection 
with the administration of controls. For purposes of ensuring 
completeness and accuracy regarding the data on industrial 
output a legislation was enacted in the shape of Industrial 
Statistics Act of 1942. Besides this, new statistical material 
was being collected by the various Departments of the Govern¬ 
ment of India* The Department of Labour initiated a scheme 
of constructing working class cost of living index numbers for 
important industrial centres in India. The War Transport 
Department collected new data relating to motor vehicles. The 



INDIAN STATISTICS 


625 


Department of Education, Health and Land collected addi- 
ttonal information in regard to acreage ; and the Department 
of Commerce obtained detailed statistics pertaining to import 
and export trade controls. 

AH these developments that took place during the period 
of the war and the improvements that came about in the 
statistical system of India after independence go to support the 
contention that the statistical system in Any country is largely 
determined by the range of Government activities, and the 
manner in and the extent to which statistics are required ami 
used for purposes of administration 

The coming of independence with new responsibilities for 
wider social and economic functions led to a further demand 
for statistics and promotion of statistical activities. What was 
more important was the emphasis, from the point of view of 
overall economic j ml icy, on a single synoptic picture of the 
information held, and consequently on aspects of proper co¬ 
ordination and control. With the formulation of the l int. 
Second and Third Plans for the country, the nerd for new types 
of statistics foi judging the progress of the plan schemes overall 
assessment of the plans and evaluation surveys was felt and 
suitable orientation of the existing statistical #>*stem both at the 
centre and the stales was attempted. An additional stimulus 
was provided b\ the grow ing statistical requirements of inter* 
national organisations like the United Nations and its speciali¬ 
sed agencies, and their attempt# io promote suitable statistical 
standards of international comparability with a view to develop¬ 
ing an integrated statistical system. 

Nature and Structure of the Indian Statistical 
Organisation 

The nature and structure of the statistical organisation is 
necessarily governed by the constitutional sei-up, The respon¬ 
sibility for collection of statistics as between the central govern¬ 
ment and state governments is determined by the responsibility 
for the subject-matter concerned. Under the Indian Constitu¬ 
tion this responsibility is shared in accordance with a three-fold 
40 



63# AN INTRODUCTION TO STATISTICAL METHODS 

classification of subject-fields Items like foreign trade, banking 
and currency and population are wholly allocated to the centre. 
There is also a common category of what are known as concur* 
rent subjects, as for example, industry where both the centre and 
the states cart operate simultaneously to meet respective require¬ 
ments, Others* like agriculture and education, are assigned 
to the slates, although inquiries and statistics relating to these 
items also figures in the concurrent list, In actual practice it 
must be added, that even in cases where states have the 
primary responsibility for the subject fields, the centre acts as 
the co-ordinating authority for the presentation of the data on 
an all-India basis. At the centre, and in some of the states, 
there is a division of subjects as amongst the various Ministiics 
and Departments. Thus, the responsibility for the processing 
of data collected, which is decentralised at present, devolves 
on the statistical organisations in the different Ministries of the 
Central Government and in the several departments of the State 
Governments. There are about 115 statistical units in the 
Central Government employing in all about 10,800 workers with 
an annua) budget allotment of about Rs. 4 crores. In the 
states there are about 200 statistical organisations employing 
11,700 workers. The Central Statistical Organisation (C.S.O.) 
set up in 1951 as an attached office of the Cabinet Secretariat 
of the Government of India has been serving as a co-ordinating 
and advisory body. The Government of India has since set up 
a Department of Statistics in the Cabinet Secretariat in 1961 
and entrusted it with the responsibility for bringing about 
net r&sary co-ordination between various statistical agencies. In 
it * functions the Department is assisted by the C.S.O. and has 
the benefit of advice from the Central Technical Advisory 
Council <m and a Standing Advisory Committee for the 

Department of Statistics. During recent vents all the states have 
set up State Statistical Bureaus. The State Statistical Bureaus 
have the responsibility for overall co-ordination within their 
respective territories. 



INDIA* STATISTICS 


027 

Statistical Orgaaiitdott «t Ike Centre 

Most of the Ministries in the Government of India collect 
or tile statistics in some manner or the other and have their own 
statistical units* They are of different sires are in varying 
stages of development and are charged with distinctive functions. 
They lend themselves, however, to the following broad classifi¬ 
cation : 

A. Processing data coming as bye-products of Adminis¬ 
tration. 

B. Organisations associated with Control Agencies, 

C. Organisations specially set up for collection and com* 
pibtion of data like Directorate General of Commer¬ 
cial Intelligence and Statistics, Industrial Statistics 
Wing or the Central Statistical Organisation, Udxmr 
Bureau, Directorate of Economics and Statistics, etc. 

D. Research Organisations like Statistical Division of the 
Indian Council of Agricultural Research. 

E. The National Sample Survey and Other Surveys, 

F. The Central Statistical Organisation set up in 1951. 

Statistical Organ* tail on in the State* 

Statistical organisation in the States are of more recent ori¬ 
gin than their counterparts at the centre. Since the war years, 
and particularly in the wake of the recommendations of the 
Gregory Committee of 19lfi, State Statistical Bureaus have 
been set up in most of the States e ther at independent statisti¬ 
cal organisations or as part of the combined economic and 
statistical set up. The State Statistical Bureaus are generally 
entrusted with the task of (i) co-ordinating of statistics collected 
by different departments, fii) publication of a statistical Aba- 
tract assembling all essential statistics, (hi) organising special 
enquiries and surveys, (tv) Liaison between statistical organisa¬ 
tion of the centre and other States, and :v) Statistical work 
relating to planning. There are, however, considerable differ¬ 
ences between the different State Statistical Bureaus'in the 
arras of their responsibility for collection of statistics. Thus, 



828 Aft INTRODUCTION TO STATISTICAL METHODS 


while in same states statistics Arc almost centralised isi the 
Bureaus, in most other States the collection of agriculture, 
labour and vital statistics generally fall outside the scope of 
work of the State Statistical Bureaus. Some of these Bureaus 
as those in West Bengal, Uttar Pradesh and Bombay have been 
conducting a number of socio-economic enquiries for collection 
of data required for formulation of policy in these States, from 
time to time. In recent years many of the State Statistical 
Bureaus have joined the collection programme of the National 
Sample Survey for conducting multi-purpose surveys on a 
continuing basis. 

Another important development which has taken place in 
recent times in the State statistical system is the programme 
of central assistance extended to States under the Second and 
Third Five Year Plans for expansion of the statistical organisa¬ 
tions and activities in the States. Strengthening of State Statis¬ 
tical Bureaus for planning needs, setting up of District Statistical 
offices, setting up of Administrative Intelligence Units for com¬ 
pilation of CD/NRS statistics, training of statistical personnel 
are some of the State statistical schemes for which central assis¬ 
tance was extended. Another noteworthy development in the 
statistical system in the States has been the appointment of a 
statistical assistant designated differently in different States as 
Progress Assistant, Statistical Officer etc. in each of the develop¬ 
ment blocks. Through this agency it would be possible to 
co-ordinate statistics of all types emanating at the block level. 
For supervising statistical work in the development blocks and 
also for co-ordinating the activities of the district level, dis¬ 
trict statistical offices were also set up on phased basis. 

s» CsUatt Secretariat 

(a) Dipmtmnt of S loin tin. In view of the growing impor¬ 
tance of statistics in connection with all schemes of social and 
economic development particularly in the context of the succes¬ 
sive Five Year Plans, the Government of India Constituted the 
Department of Statistics in April 1%). The functions of the 
Department include co-ordination of the statistical activities 



INDIAN STATISTICS 


629 


of the various Central and State agencies, the setting up of 
standards and norms in connection with the collection and 
presentation of statistics relating to various subjects and the 
issue of general directions to secure the necessary statistics in 
appropriate form for successful planning and policy formulation 
and implementation. The Department consists of Additional 
Secretary 1 to the Cabinet, Director of the C.S.O. who is cx* 
officio joint Secretary, one Deputy Secretary and one Under 
Secretary, 5 Section Officers and the necessary complement of 
the supporting Mmistrial staff 

The two main statistical offices under the Department of 
Statistics are the Central Statistical Organisation and the 
Directorate of National Sample Survey. 

(b) Central Statistical Organisation (C, S, 0.), It was set up 
»n may 1951 as an attached office of the Cabinet Secretariat. 
Recently the functions of the G, S. O. have considerably 
expanded by the transfer of National Income Unit from the 
Finance Ministry to the C. S. O in 1954 setting up of a sepa¬ 
rate unit to look after statistical work relating to Planning in 
collaboration with the Planning Commission, ere Its func¬ 
tions now consist of the following 

(1) To prepare and publish regular and ad hoc publication*, 
such a* the Annua) Abstract of Statistics, Monthly Abstract 
of Statistics, the Weekly Supplement to Monthly Abstract of 
Statistics, Estimates of National Income, Sample Surveys of 
current interest in India, etc. 

(2) To serve as a channel of communication with the 
U. N. Statistical Organisation, both with regard to observance 
of international conventions relating to economic statistics and 
provision of data required for the regular publications and for 
various ad hoc purposes, 

f3j To represent graphically current statistics with a view 
to throwing light on the developing economic situation. 

(4) To advise the Ministries and other Government 
agencies on statistical matters and arranging inter*depart'' 
mental discussions on statistical matters. 



030 an ifiTBont'CTum to statistical methods 

(5/ To co-ordinate the statistical work of the Ministries 
and other Government agencies with a view to eliminating 
and preventing unnecessary duplication and reducing the over* 
all’cost to a minimum 

(0) To develop definition* and standards for improving 
national and international comparability and to give conti¬ 
nuing attention to the improvement of the quality of informa¬ 
tion required by Government. 

(7) To keep in continuous touch with national organisations 
in other countries of die world in the context of the latest deve¬ 
lopment in methodology as well as organisation. 

(8/ To undertake statistical work relating to planning. 

(0) To estimate Annual National Income and Research in 
National Income 

(3 0) To organise and conduct training courses in official 
slat istics. 

11) To organise meetings of the Conference of Central and 
State Statisticians. Standing Committee of Department Statis¬ 
ticians, Working Parlies, etc. 

12} To assess the present position with regard to popu¬ 
lation stutfie* and demographic research. 

(13, To conduct the ‘middle class family living survey* 
utilising N. S. S. held agency. 

(Hi To undertake special work, as and when it arises, 
whether it be at the request of the Central Government 
Ministries or of the State Governments. 

(Cf IhrutowU oj.Xatwnal Survey. Limitations of cost and 
man-power* the hurdles in the way of expeditious collection 
of data on a comprehensive scale, gave prominent place to the 
sampling technique The Directorate of N\ S. S. was set tip in 
Januaiy 1959. in the Department of Economic Affairs! Ministry 
of Finance* to organise country-wide multi-purpose National 
Sample Survey for collecting data relating to all aspects of the 
national economy <m a continuing basis required by the National 
Income Committee (N.l,G*j ami the Planning Commission and 
other Ministries of the Government of India. In 1957, the Direc¬ 
torate was transferred to (he Cabinet Secretariat. The response 



INDIAN STATISTICS 


611 


bxlity for the technical guidance and tabulation of data collected 
by the Directorate rest* with the Indian Statistical Institute 
(I. 8* I.), Calcutta. The National Sample Survey Reports are 
regularly {round*wise) published by the Directorate. 

This Directorate is responsible for the collection of statistical 
data by the method of random sampling in the various sectors 
of national economy like socio-economic surveys with all-India 
coverage. The collection of data on a random sampling basis 
by household enquiry covets demographic ami socio-economic 
conditions such as composition of households, pattern of income 
and expenditure, land utilisation, crop cutting experiments 
agricultural prices, employment and unemployment, etc. Since 
1953 the N. S. S. has also uken over the work of large scale 
sample survey in the field of agricultural statistics which was 
previously conducted by the I.G.A.R* 

(d Directorate of Indtistrial Statistics> Calcutta . This Directo¬ 
rate was set up in 1944. Since 1946 it has been conducting the 
annual Census of Manufacturing and prepares a monthly index 
of industrial production. The functions of this directorate are : 

(i ) To guide and co-ordinate the conduct of the Census of 
Manufacturing Industries in the various States arid to 
tabulate, consolidate and publish the data collected 

(ii) To collect monthly statistics of production of selec ted 
industries of India and publish them. 

(iii) To compute the monthly index of industrial production 
in respect of the selected industries referred to in (ii). 

(iv) To furnish information relating to industries on inqui¬ 
ries from the public arid various other bodies. Functions 
as an industrial intelligence unit so far as the Indian 
Industries are concerned. 

The work relating to thr compilation of the statistics of 
production, imports, allotment, distribution, etc., of iron and 
steel, was transferred to the other of the Iron and Steel Con¬ 
troller with effect from 1st May, 1957. 

a. Ministry of Commerce and ladiftry 

(a) Department of Commercial Intelligence and Statistics^ Calcutta * 
This Department was established in 1895 for the collection 



632 AN INTRODUCTION TO STATISTICAL METHODS 


And publication of the principal series of statistics. Mot! of 
the statistical work of the Government of India was centralised 
in this department till the beginning of the World War IL 
However* many of its functions have since been transferred to 
the concerned Ministries mainly due to the formation of statis¬ 
tical units in the different Central Ministries during the war. 
The department n rajxmHibir mainly for Trade Statistics and 
now performs the undermentioned functions : 

i\) It tender* commercial intelligence services, viz , 

(a) Collect and furnish commercial information required 
by Government and the trade. 

b) Mediate in cnmniffrial disputes between Indian and 
foreign firms with a view to bringing about amicable settlement, 
(c) Grant trade introductions. 

M) Maintain ;t register of firms in India and to enter 
therein relevant information relating to firms. 

fe) Maintain a commercial library and reading-room for 
the use of the public. 

(\) Extend assistance to commercial eomerns with a 
view to stimulating the foreign trade of India, particulaily the 
export of Indian produce and manufactures. 

igi Disseminate commercial information received from the 
Indian Government 'Iradr Representatives abroad. 

(h) Publish the weekly Indian Trade Journal 
! 1 } In general, assist persons engaged in trade and industry. 
(2' Compile and publish statistics of trade, shipping, etc., 
in the publications issued by the departments. 

(3; Beside? the above, the Department also compiles : 
fa Statistics relating to jails in India and furnishes them 
to the Ministry cd ffome Affairs, 

(h| Special statistics of various kinds required by the 
Government, the Reserve Bank of India and other bodies, and 
U > I coir? press notes regarding trade nwtoms, etc. 

Its important regular publication* arc : 

1 Indian fradr Journal t Weekly). 



INDIAN STATISTICS 633 

(2) Accounts Relating to the Inland (Rail and River* 
borne) Trade in India (Monthly), 

(3) Monthly Statistics of the Foreign Trade of India. 

(4) Annual Statement of the Foreign Trade in India. 
(Voli. I and II). 

( b) Office of the Economic Adviser to the Gown merit of Indict, 
Sew Delhi. Established in 1933, the office of Economic 
Adviser is attached to the Central Ministry of Commerce and 
Industry. Before the formation of the C. S. O., it functioned 
as the central co-ordinating authority in the held of statistics 
for the Government of India. At present it maintains whole¬ 
sale price indices and price data in general, and acts as the 
co-ordinating unit between various statistical units of the 
Ministry and as liaison between the Commerce and Industry 
Ministry on the one hand and the Planning Commission and 
Central Statistical Organisation on the other. 

Its regular publications arc : 

(I) Wholesale Price Index (Weekly). 

(2; Wholesale Price Revised (Weekly). 

13) Basic; Statistical Material regarding Foreign Trade, 
Production and Prices (Monthly). 

(c) Statist tail Division, Office of the Chief Controller of Imports 
and Exports, A>ti Delhi This Division was established in 1947 
with two separate offices of the Chief Controller of Exports and 
the Chief Controller of lriqmrts for the collection of statistical 
data regarding the day-to-day working of the two offices Later 
on the two offices were amalgamated (1951) as the work of the 
trade control organisation expanded considerably. 1 ere are 
now 10 branches in the Division and each branch has been 
assigned a distinct function to perform, such as imports, exports, 
import licensing, foreign exchange estimates, weekly bulletin of 
import and export licence, etc 

The Division collects the data mainly with a view to meet 
the needs of policy for the issue of the import and export 
licences. For framing policy regarding export trade control, 
it, collects data of actual exports and export licensing. On the 
import side, statistical data are collected for 1,000 items of the 



534 AW INTRODIJCTIOW TO STATISTICAL METHODS 


import trade control schedule and the value of licences issued 
itemwise and currencywise The division also supplies statis¬ 
tical data on trade to Import and Export Advisory Councils, 
international organisations on foreign trade like GATT, RCAFE, 
etc. Its regular publication is—Weekly Bulletin of Imports 
and Exports Trade Control. 

3. Ministry of Finance 

Restrict Bank of India . T he Reserve Bank of India has a full* 
fledged department of research and statistics. Its Division of 
Statistics compiles statistical data for the Bank's publications 
and supplies statistical information on banking, currency, 
finance, etc., for the use of the Bank, the Government and 
international bodies like the International Monetary Fund, etc. 
The Division also publishes Index numbers of prices of indus¬ 
trial securities and their yields. It analyses company accounts 
and collaborates with the other Divisions of the Department 
particularly in the surveys regarding sampling and other 
technical work. Rural credit survey in selected districts is 
conducted annually by the Rural Economics Division of the 
Department in collaboration with the Division of Statistics. 
The Division of Internationa) Finance of the Department is 
responsible for the compilation and refinement of India's 
balance of payments, statistics including data on capital tran¬ 
sactions, visible and invisible trade, foreign exchange reserves 
and changes in the country's foreign assets and liabilities. For 
the formulation of banking and credit policies of the Bank, a 
number of surveys of banking statistics is conducted by Division 
of Banking Research of the Department. 

Some of the important regular publications of the depart¬ 
ment are : 

(1) Statistical Tables Relating to Banks in India (Annual). 

(2) Report on the Trend and Progress of Banking in 
India (Annual). 

(3) Report on Currency and Finance (Annual). 

(4) Review of Co-operative Movement in India (Annual). 

(5) Reserve Bank of India Bulletin (Monthly). 



INDIAN STATISTICS 


036 

4. Ministry of Food and Affieihire 

(a) Directorate of Economics and Statistics. The setting up of 
a unified Directorate of Economics and Statistics under the 
Ministry of Food and Agriculture in 1945 proved conducive to 
the integration of available information in the field of agricul¬ 
tural economics and statistics. It is responsible for the collec¬ 
tion, compilation and publication of a wide variety of agricul¬ 
tural statistics on an all-India basis The Directorate also 
collects agro-economic data and advises the Ministry in the 
formulation of agro-economic policy. 

The following are some of the important regular and ad hoc 
publications of this directorate : 

Regular Publications ( Annual ) 

1. Abstract of Agricultural Statistics, India. 

2. Indian Agricultural Statistics (Vols. 1 & II). 

3. Estimates of Area and Production of Principal Crops in 
India (Vols. I & II), 

4« Indian Forest Statistics (Vols. I & II). 

5. Indian Land Revenue Statistics. 

6. Agricultural Prices in India, 

7. Agricultural Wages in India, 

0. Bulletin of Food Statistics. 

9 Indian Agriculture in Brief- 

Ad Hoc Publications 

1. Agricultural Legislation in India (Vol*. 1 to VII), 
1951-56. 

2. Indian Crop Calendar, 1940, 49, 51, 65. 

3. A Bibliography of Indian Agricultural Economics, 
1954, 

4. Studies in Agricultural Economics (Vols, I k 11), 1954, 

(b) The Indian Council of Agricultural Research (/.C A.R,}, 
Established in 1929 the LC.A.R. has a fully equipped 
statistical wing under the Statistical Adviser, The functions of 
the wing are : 

(i) to advise on the planning of agricultural and animal 
husbandry experiments, 



330 AN INTRODUCTION TO STATISTICAL METHODS 

(ii) to scrutinise statistical programmes and progress 
reports of the research schemes of the council and papers 
received for publication in the council journals, 

(iii) to impart training in the agricultural and animal 
husbandry statistics, 

(iv) to carry out fundamental research on the application 
of statistical methods to agriculture and animal husbandry 
problemit and 

(v) to conduct research in sampling techniques for collec* 
tson of agricultural and animal husbandry data. 

Since 1932 the Council has been imparting training in 
Agricultural Statistics and has instituted an additional two- 
year course in Agriculture and Animal Husbandry Statistics 
for training professional agricultural statisticians since 1945-46. 
It has done pioneering work, among others { in introducing the 
method of random sampling for the estimation of yield of crops 
and evolving suitable techniqes for experimenting fields, etc. 
(Its quarterly publication is Statistical News iMter on the Acti¬ 
vities of the Statistical Wing of the l.C.A.R.) 

5. Ministry of Home Affairs 

Office of the Registrar General, l he decennial census of popu¬ 
lation in the country is conducted by the office of the Registrar 
General in the Ministry of Home Affairs. A Census Act was 
passed in 1946 to facilitate the census of 1951. In this census 
new features like economic classification of population, main¬ 
tenance of lists of households, sample verification of the count 
and preservation of census records and registers in the form of 
District Census Handbooks were introduced. 

For #96 r Census . The census is conducted in close co-opera¬ 
tion with the State Government though the responsibility of 
the census rests w ith the Central Government as ‘Population 1 is 
assigned to the Centre under the Indian Constitution. The 
Indian Census is not only largest in dimension but also cheaper 
and more accurate and that too in a predominantly agricultural 
and illiterate population. This is the result of the co-operation 
of the public and the devotion to duty of a vast army of about 



INDIAN STATISTICS 637 

6 lakh honorary enumerators including employees of Govern¬ 
ment and local bodies, and teachers. 

The following are some of the important publications : 

1. Census of India, 1951, Vol. 1, Part I-A (Report)* 

2. Census of India, Paper No. 1, 1953, Sample Verification 
of the 1951 Census Report 

3- Census of India, Paper No. 3, 1953, Summary of Demo¬ 
graphic and Fxonomic Data, 1951 Census. 

4. Census of India, Paper No. 5, 1953, Maternity Data, 
1951 

5. Improvement of Population Data. 

6. Ministry of Labour mud Employment 

Labour Bureau, Simla. It was set up in 194ft. Its functions 

are : 

(i Collection and compilation of labour statistics, 
l ii! Co-ordinating work of other agencies for collection of 
labour statistics, under the Collection of Statistics Act, 

(iii) Maintenance of Consumer Price Index Numbers, 

(iv) Keeping up-to-date the factual data relating to work¬ 
ing conditions, collected by the Labour Investigation Com¬ 
mittee, and 

(v) Conducting researches into specific problems with a 
view to furnishing data required for the formulation policy. 

Its regular publications are : 

1. Indian Labour Year Book (Annual). 

2. Large Industrial Establishments (Annual). 

3. Statistics of Factories (Annual). 

4. Working of the Trade Unions Act (Annual). 

5. Report on the Working of The Minimum Wages Act 
(Annual). 

C. Indian Labour Gazette (Monthly). 

7. Mkistry of Steel Mines sod Fuel 

(a) Indian Bureau of Minth It was established in 1948 and 
its main functions are the collection and publication of data 
relating to mineral product ion in India, slock prices, consump- 



638 AN INTNOOtJCttON TO STATISTICAL METHODS 

tlon, etc., and collection and maintenance of information 
regarding world production, trade, mining ruler, etc. There 
ii a statistics branch in the mineral economic division under a 
Deputy Mineral Economist. 

Mineral production in India is regularly published both 
annually and half yearly. 

(b) Iron and Sttel Contra! Organisation (Drparmgnt of Iron and 
$W), Calcutta . It was established in 1941, its functions are : 

(i) Compilation of statistics relating to production, distri¬ 
bution and allotment of iron and steel, 

(ii) Maintenance of Statistics relating to import and ex¬ 
port of iron and steel, 

(iii) Collection of statistics of production of iron and 
manganese ore and labour employed in iron and steel industry, 
and 

v'iv) Preparation of statements showing requirements of 
coal by iron and steel industry. 

(c) statistical Section t Oil and Natural (Jos Commission\ Dthradun. 
It was set up in September 1956 and the following are its 
functions ; 

(i) to collect statistics of the activities of the Commission 
from the various sections of the Commission and to maintain 
it on proper graphs and charts to show the progress, 

(ii) to obtain India-wide statistics on production, consump¬ 
tion, refining and imports of crude oil and petroleum products, 

(tii) collection of world-wide data on production, refining 
and consumption of oil in the most important countries of the 
world, and 

(iv) collection of statistics on the energy consumption and 
future requirements in India. 

(d) The Indian Statistical Institute (/.5,/.), Calcutta. It is non- 
official organisation and was established in 1937. It helps in 
developing the Indian system in three different ways : 

(i) as a learned society, 

b) as a centre of research and training, and 

fiiil as an agency for conducting large scale statistical 
projects. 



INDIAN STATISTICS 


639 

Since 193ft, it hat been conducting examination# for the 
award of certificates and diplomas of proficiency in statistics. 
The technical work relating to the National Sample Survey 
is in its charge. It also conducts the International Education 
Centre at Calcutta along with International Statistical Institute 
and UNESCO* The Institute now functions as a focal centre 
for professional training and research and as the national 
statistical and computational laboratory in India. Recently 
the 1S T. has expanded and re-organised its training activities 
in order to meet the growing demand for trained statisticians. 
A three-year post-graduate course in statistics, six to nine 
months officers* training course in statistics (in collaboration 
with the C.S.O.), computer training, etc., arc some of the new 
training Courses organised. It has several branches in the 
country and publishes The Sankha 

Population Statistics 

Statistics of population are obtained partly through a 
population census or a survey and partly through the col¬ 
lection of vital statistics. A population census aims at ascer¬ 
taining the total population of a country, its geographical sub¬ 
divisions, its age and sex composition, and its economic 
characteristics. Vital statistics, on the other hand, refer to 
the continuous registration of births, deaths, marriages, etc. 
Its scope is the study of man as affected either by heredity 
or by environment so far as the results of this stud)’ can be 
numerically stated. If comprehends, decides causes of deaths, 
accidents, crimes, poverty, prosperity and so on. It has its 
biological and sociological side. It is also referred to as 
*Demographic Statistics*, 

Conducting a Census* In most countries of the world popula¬ 
tion census is usually taken once in ten years The census may 
be taken either by interview or by tabulating the answers 
returned in response to a return-questionnaire The popuiattoti 
concerned may by enumerated either de fat fa or dtjurt. De facto 
enumeration means that all persons *ue counted in the area 



040 AW INTRODUCTION TO STATISTICAL METHODS 

where they arc physically found on the date of the census. In 
ease of dtpm count, the population of each .area is defined as 
persons who usually reside in the area regardless of their actual 
location on the census date. 

The International Conference on Censuses has laid down 
the main elements of a plan for conducting a population 
census, which are enumerated below : 

1. In the first instance it is necessary to promulgate the 
basis for the census, and to define specifically : 

(a) the scope of the census programme, 

(b) the office responsible for the census, and 

(c) other legal provisions and conditions for the 
legal census 

2. Financial and personnel estimate : 

(a) Preliminary estimates of census expenses, 

(b) Final budget estimates, and 

(c) Final personnel estimate for the operation of the 
census. 

3. Programme of objectives : 

(a) Determination of objectives and general pro¬ 
gramme of a census, 

(b) Preliminary detailed c alendar of required opera¬ 
tions. 

4. Census organisation and administration ; 

(a) Organisation of census office which will include 
functional and personnel organisation, and 
estimate of equipment, space and materials re¬ 
quired. 

(b) Organisation of field offices which will require 
determination of number and location of field 
office, functional and personnel organisation, 
transportation and communication arrangements 
and supply of equipment, space and material. 

5. Preliminary work of the census which involves deter¬ 
mination of territorial division, delimitation of census areas, 
preparation of the description of dividing lines outlining the 



INDIAN STATISTICS 64 \ 

census enumeration districts, numbering the census blocks and 
preparing the maps for census areas. 

6. Determining the design and the contents of the ques¬ 
tionnaire which involves; 

(a) Study of international recommendations regarding 
census items, 

(b) Study of previous experiences with question¬ 
naires and the process of collection of information, 

\c) Consideration of suggestions from other govern¬ 
mental agencies and from non-governmental urns 
of census data. 

The questionnaire shall be finalised after conducting an ex¬ 
perimental census, Arrangements shall, then* be made for 
printing and distribution of questionnaires. 

7. Plan of enumeration : 

(a) Determination of the basic procedure for the 
collection of the data, 

(b) Plan for control of the quality of the data col¬ 
lected, 

8. Plan of sampling, basis and detail. 

9. Plan of tabulation—principal lines of tabulation, inter¬ 
relation of complete tabulation with sample tabulation and 
tabulation of work schedules, determination of personnel and 
equipment required and issuing of instructions for tallying of 
census results, coding, verification, editing, punching, tabu¬ 
lation, etc. 

10. Plan for publication. 

11. Experimental census ; 

(a) Conducting of experimental census, 

(b) Analysis of pre-census results, 

(c) Revision of questionnaires. 

12. Propaganda 

(a) Designing the publicity programme, 

(b) Opening the publicity campaign, 

(c) Conducting yearly publicity. 

13. Recruitment and training of seal for enumeration. 

41 



642 A!* IlfTIIODVCIWII TO STATISTICAL METHODS 

14, Survey of mreal, 

15, Distribution of n emery material to such operating 
office, 

16, Training of staff for compilation, tabulation am! 
publication, 

f 7. Census enumeration, 

IS. Rrceipt and examination of quritionnairrs and verifi* 
cation of the quality of the data collected. 

19, Publication. 

20. Stuffirt and research which involves conducting of 
special surveys and analytical studies and preparation of files 
on history of the census for future use. 

The Problem »f Quality la Cenias Data 

The quality of the census data depends to a great extent on 
the dull, integrity ami experience of the persons entrusted 
with the tat'fc of planning and conducting the census and on 
the adequacy of funds. Reliable figures cannot he expected 
in countries where tradition of census taking is not firmly 
established and where the personnel technically qualified for 
the planning and field operations are scarce. Such conditions 
exist in Urge parts of the world. Even in statistically advanced 
countries, the accuracy cannot be said to be perfect 

Errors creep in the census data from many sources. It 
might be that the work has been entrusted to a large number 
of such enumerators who have little knowledge of statistical 
methods and who cannot discriminate between biased and 
unbiased data. There might be reasons for people filling the 
forms to be interested, in concealing facts or telling them 
wrongly. A bias, for instance* may arise if the seats in the 
legislature or the services in the state are allotted on communal 
basis. The movement of population, homeless wanderers and 
the beggars cause risk of duplication or omission. People who 
live in isolated localities, people on boats and terns, |*em>n* 
having odd working hours, in persons who do not speak the 
principal languages of the country are a sourer of under* 
enumeration,. T he data regarding age h often subject to gross 



jftDun n inwra 143 

inaccuracies for a number of reasons. People are generally 
ignorant about their precise age* They have a tendency to 
return ages at multiples of 2, 3 or 10- The ages of children 
below 5 year* of age are very vaguely reported, and the infant* 
under 4 months of age are sometimes ignored. Then the 
confusion between the age at the neat birthday and complete 
years of age is not uncommon. There is many times deliberate 
understatement of age on the part of young women and over- 
estimation on the part of the older ones Finally, error* may 
accrue in occupational classification of the population. 

The quality of particular census data can be judged on the 
basis of the accuracy of count and the adequacy of coverage 
and accuracy of classification. 

That census data would be thr best which contains the 
most accurate count of the items concerned, A complete 
re-enumeration to judge the accuracy of the count is neither 
practicable nor useful. Two identical counts of population 
are not even rarely attained for areas of any considerable 
magnitude Completeness of enumeration can, however, be 
estimated by a planned verification carried out immediately 
after the original enumeration in a scientifically selected 
sample of the area. The extent of certain errors and omissions 
in census statistics can be ascertained in part by internal ana¬ 
lysts, especially, if the census is taken at regular intervals so that 
use can be made of figuers at consecutive census dates. The 
number of persons returned as 14 year* old in the 19^1 census 
in India* for instance, should not exceed the number of those 
returned as 4 years old in 1042* making, of course, an allowance 
for mortality during this period. Consistency of figures can 
also lie checked by comparison of sex ratio* by age at consecu¬ 
tive census or by comparison of census figures w ith registered 
births and other available data. 

Besides the accuracy of count and adequacy of coverage it 
is also necessary to ensure accuracy and usefulness of the classi¬ 
fication of the data. The accuracy of classification is mainly 
affected by differing interpretations of various terms. A unit 



644 AW IWTHODUCTIOW TO STATlflTlCAt METHODS 

is excluded from or included in a particular class according to 
what that class is taken to imply. Such difficulties are 
experienced while recording* e g., the marital status of 
persons, their literacy or their occupations. 

According to the concept of statistical tolerance, the data col¬ 
lected may be useful even if it is only approximately accurate. 
But the departure from accuracy must not exceed reasonable 
limits. Effort* must constantly be made to improve the accu- 
racy of the data. By way of improvements it has been 
suggested that d§ facto count may be replaced by do jurt 
enumeration. Genius schedules may be distributed to the 
households in advance of the census date with a request 
that they be filled out and kept ready for collection by 
the enumerators. It is also suggested sometimes that the 
enumerators will take greater interest in the work if the 
method of paying the enumerators is changed from piece 
rate to hourly or daily rate, But the change necessarily 
involves high costs which are hard to justify on the basis of an 
intangible improvement in quality. Further, the efficiency of 
the census work should not be allowed to suffer on account of 
insufficient training of the enumerators, inadequate number of 
supervisory personnel and insufficient organisation of field 
work. All efforts should be made to secure local co-operation. 
The schedules which the enumerators are required to handle 
should lie simplified. Tire opinion is, however, against a 
reduction in the number of questions for the sake of improving 
the quality of the returns. 

Estimating Intercenaual Population 

Population census are taken only at regular intervals of 10 
years and they are enviable of depicting only the situation just 
at the time of the census But the necessity sometimes arises 
of knowing the population figures for an year in between the 
two census. The use of the figures of the previous or later 
census for that year cannot lie allowed on grounds of statistical 
accuracy because population is never stationary and changes 
(torn year to year. It is, therefore, important that we should 



INDIAN STATISTICS 645 

be familiar with the method! of estimating intercensual 
population. 

L If the natural increase 1 of population, be. t the excess 
of births over deaths since the last census and the amount erf 
emigration and immigration since then arc known, it is possible 
to state the population of any given year. Account is also to 
be taken of any special event during the period such as war 
or famine that affected the growth of population. The advan- 
tage of the method lies in the fact that the calculation is based 
on current events and does not depend upon the assumption 
of some fixed rate of increase. But this method would he 
unsuitable for a country whose vital statistics are defective. It 
is also unsuitable for local estimate of population as in their 
case migration statistics do not exist. 

2. Population at two most recent census being known, it is 
assumed that the population increases in arithmetic progression, 
i.e.,the same annual rate of increase continues every year. 
The method is obviously unsound because firstly it evades the 
actual variations in the incidence of births and deaths in 
different years and secondly, it make* no provision for the 
increase in the number of parents year to year. Had there 
been uniform rate of increase in population every year, the 
population for any year could be known by the method of 
interpolation or extrapolation. 

3. Increase in the arithmetic progression having been 
disapproved, geometric increase has been suggested. But this 
is also subject to similar criticisms. The application of geo* 
metric progression to population increase can give no indica¬ 
tion of declining birth rate in many parts of the world which is 
significantly affecting the rale of increase of population there, 

4. It has also been suggested that estimates of intercensual 
population can be based on the local information 'obtained 
from electoral rolls assuming that the proportion of the local 
population to the entire population remains approximately the 
same from one census to another. 

5* Bowky•Robertson Committee has suggested the use of 
life Tables, given in the actuarial report of the census for 



646 AN INTEODUCTIOW TO STATISTICAL METHODS 

estimating the growth of population. We can estimate the 
number that will be after a certain number of years beyond 
the fast census by applying life tables to the number in the 
last census. But the method is based on some assumptions 
which arc not very true. For instance, it is assumed that the 
rate of survival remains same as in the last decade, and that 
the statement of ages for all people are accurate. The inaccu¬ 
racy in the statement of ages has been admitted by the Bowley- 
Robertson Committee itself. It is only to some extent that 
tile inaccuracy can be smoothened out by the mathematical 
process of graduation. 

Population Statistics in India 

The first systematic attempt to record population statistics 
in India dates back to the year 1872. But this census was 
neither uniform nor did it synchronise over the different parts 
of the country. Beginning from 1881, however, the census is 
a regular decennial feature, the latest one having been com¬ 
pleted recently in March 1961, The census operations in 
India have been countrywide and their coverage, accuracy 
and methods of data collection have been constantly 
improved. 

The 1941 census saw more changes in method than had 
previously taken place in the past 70 years since the census 
began. The chief was the abolition of the old one-night 
theory of rmimeration and the next was the abolition of the 
old schedule and the conducting of enumeration straight on 
to the slip which was later sorted to produce the various 
tables. I?rider the one-night cuunieiation the data collected 
during the preceding few weeks was to be verified over the 
entire length and breadth of the country on one single night. 
Obviously l.h.e verification of the count in the short span of a 
night was difficult and there was always the possibility of the 
figures being falsified by unscrupulous enumerators. These 
difficulties and dangers were eliminated under the system of 
1941 when the period of verification was extended to a few 
days. This also resulted hi a change in the basis of census. 



IffOUK STATISTICS 


647 

Under the one-night theory the basis of emimei.itton was the 
place where an individual happened to be at the time of the 
census. But with the extension of (he pe* iod of veiitkatkvn 
this basis was replaced by that of normal resident, i. e , of d« juu 
basis. This change of the basis of enumeration gave an added 
importance to house list which could now yield a population 
return very close to the actual census figure. ’.1 his in its turn 
afforded a possibility of verification in case of wrong enmnera-* 
lions. Another consequence of the change was a considerable 
reduction in the number of enumerator*. The abolition of the 
old schedule and lire introduction of enumeration slips was 
responsible for reducing a lot of copying work and t eduction 
of printing cost too. 

In spite of all these, iinpiovemrms the system of census* 
taking in India was far from satisfactory. Its delects were 
clearly brought our in the following description of the system 
by M. W. M. Years who said, “This system, if that word can be 
used here, is in brief that every 10 years one officer is appointed 
to conduct a census and some to work under him arc appointed 
in cadi province. The States take corresponding action, 
These appointments are made at the minimum of time before¬ 
hand and within one year questionnaires have to l>c settled* 
the whole country divided into enumeration units, a hierarchy 
of enumeration officer* created and trained* million* of 
schedule* or slips printed and distributed over the face of the 
country* the whole process of enumeration earned out and 
checked* tabulated and then sent out to offices located in any 
odd place that can be found on make-shift pigeon-holes and 
furniture, and with temporary staff, rushed through the 
pleases— and then, in the third year the whole system is wound 
up, the officer* anil office staffs are dispersed and India 
makes haste to discard and fin get a* soon a* possible alt the 
experience so painfully brought together*** 

This phocnix-Uke nature of the Indian census operation 
has, however* been changed since September 1948 when a 
permanent Census Act was placed on the statute book. As 



648 Aft IftTMODUCTlOft TO STATISTICAL METHODS 


a result of this Act, there is now a permanent post of a Regis* 
trar General and Census Commissioner, and a regular adminti- 
trative machinery has been set up for Census operations whose 
structure is as follows ; 

Registrar Genera) and Census Commissioner 

f 

Superintendent Census Operation 

I 

District Census Officers 

t .i.^ 

Urban Area Rural Area 

I I 

Superintends t K a nun go 

i ! 

Circle Supervisor Patwari 

I ! 

Block Enumerator Block Enumerator 

(Village Headmaster 
or School Teacher) 

Information Collected by the 1951 Censai 

The following particulars were ascertained m respect of 
every person who was enumerated . 

\!) Name, relationship to head of household, birth place, 
sex, age and marital status, 

(2) Household economic .status, employment staus (if 
any), principal means of livelihood and subsidiary means of 
livelihood (if any;, and 

(3; Nationality, religion, membership of ‘special group* 
(if any), mother tongue, biiiogumn (if any), literacy and edu¬ 
cational standard, and particulars of displacement (for displaced 
person only). 

In addition to these items which were common to all parts 
of India one other item was prescribed for each Stale by the 
State concerned. 

A comparison of the above with the international list of 
questions m recommended by the committee of V. N. experts 
on population census would reveal that the Indian question¬ 
naire covered the entire international list with one exception. 



IMMAW STATISTICS 


649 

viz., fertility data. Even this data was collected in three of our 
Slates with a population of nearly 70 million. In addition to 
international list our qucsiionn&iie asked three more questions 
relating to (i) Displaced persons, (it) Economic status, and 
{isi) Means of livelihood. 

The 1951 census is important mainly from two points of 
view. Firstly, the census questionnaire was so framed as to 
render possible international uniformity and comparability of 
population statistics. Secondly, it marks a significant advance 
over the previous census in that it has laid emphasis on the 
economic data rather than on information regarding religious 
faiths of the people. The 1941 census was absolutely silent on 
occupational categories which form a very important part of 
the {K>pulation census of 1951. As was pointed out in the 
Census o? India, Paper No. 2, published by the Registrar 
General, *‘In the 1951 census, though information was collected 
on ihc religions of citizens as returned by them, unlike the 
past, enumeration records were not sorted out on the basis of 
religion.’* Such sorting in the previous census yielded a popu¬ 
lation hguif for every village nr town broken into population 
groups, differentiated by religion. These became the units for 
the fur ther sorting of the census data. Thus data about age, 
marital status, literacy, etc., were prepared so as to exhibit 
figures for persons professing different religions separately* 

A different procedure for tabulation has been adopted in 
the 1951 Census. The enumeration record is first sorted out 
on the basis of the principal means of livelihood of every citizen, 
as recorded at the census. These have yielded figures for each 
village or town for the eight main livelihood classes. These 
have become the Units for further sorting of census data. The 
information regarding population groups differentiated by reli¬ 
gion is thus restricted to ascertainment of numbers only. This 
shows the bent of the census towards depicting as dearly as 
possible the economic structure (or livelihood pattern) of the 
country i 

To depict the economic structure of the country, enquiries 



650 AN INTRODUCTION TO STATISTICAL METHODS 


have been made about two things : (i) Economic Statu*, and 
(a) Mean* of Livelihood. 

(I) Ecommu Status, The main question asked was : 4, Arfc 

you a seif-auppotting person, a non-earning dependent or art 
earning dependent*'? Economic status was taken to imply the 
Status of the person in the economy of the household to which 
he or she Wongs If the cost of maintaining himself is wholly 
met by the income of the person concerned he is to Ire regarded 
as a self-supporting person. If the cost of his maintenance is 
partly met by his own income and partly by that of some other 
member of his household, then he is classed as an earning 
dependent. Lastly, if the cost of his maintenance is wholly 
met by the income of some other member of the household, 
then he is called a non-earning dependent. 

These questions provide us some interesting information. 
Out of the total of 3,566 ,a Ms of people, 2,143 lakhs (or 60 1 
per cent) have been classified as non-earning dependents, and 
379 lakhs (or 10 6 per cent) as earning dependents. Table 
23,1 shows the rural, urban and sex w ise break up of non 
earning dependents and earning dependents. 


TABLE 23 I 


Non 

No. 

-earning Dependents 
in Lakhs Percentage 

Earning Dependents 

No. in Lakhs Percentage 

Rural males 

674 

45*0 

119 

7'9 

Urban males 

152 

45*6 

15 

4*6 

Rural females 

1,065 

73-5 

232 

160 

Urban females 

252 

08 1 

13 

4 3 

Total 

2,143 

60-1 

379 

106 


The non-earning dependency among the females ^sub¬ 
stantially higher than among males. Again it is higher among 
urban females than rural females. Earning dependency is 
important in villages and more specially among women. 

The remaining 1,044 lakhs (or 29 3 per cent of the total 
number) are self-supporting persons. Table 23.2 shows the 
rural, urban and sea wise break of these 1,044 lakhs. 





INDIAN STATISTICS 


051 


I ABLE 23,2 


Self-Supporting Persons 

No. in Lakhs Percentage 

Rural males 

706 

47 1 

Urban males 

166 

49*8 

Rural females 

151 

10*4 

Urban females 

21 

7‘4 

Total 

1,044 

29*3 


(ti) Aftans of Uvilihood. The main question asked was, 
wha* is the gainful occupation or other source of income 
winch forms your principal means of livelihood ? Tlie subs** 
diary question was, have >ou a secondary means of livelihood ? 
If so, what is it ? Some working rules were laid down 
prescribing the manner of recording answer to these questions* 
It was. for example, laid down that the means of livelihood of 
earning or non-earning dependent would be the same as that 
of the self-supporting person upon whom lie or she is dependent. 
Further, if otic has more than one means of livelihood, that 
occupation which provides the greater part of his or her income 
would be the principal means of livelihood. 

Means of livelihood have been divided broadly into two 
categories, agricultural and non-agricultural. Persons receiving 
their income from agriculture are classified under one or other 
of the following four heads; (I) person who cultivates land 
owned by him, (2) person who cultivates land owned by 
other persons, (3) person who is employed as a labourer by 
another person who cultivates land, and (4) person who 
receives rent in cash or kind in respect of land which it culti¬ 
vated by a not he r person. The self-supporting non-agriculturists 
are divided according as to whether they are ; (i) employers, 
(ti) employees, (lit) independent workers, or (iv) none of 
these. 

The rural/ur ban, agriculturist, and sea wise break of self- 
supporting population is given below ; 






652 AN INTRODUCTION TO STATISTICAL METHODS 


TABLE 23 3 

Self-Supporting Persons 
Agriculturists Non-Agriculturists 

Number Number 



in Lakhs 

Percentage 

in Lakhs 

Percentage 

Rural males 

5% 

802 

140 

198 

Urban males 

19 

114 

147 

88 m 6 

Rural females 

121 

80-2 

30 

19-8 

Urban females 

4 

!9 7 

17 

80-3 

foul 

710 

68-1 

334 

31 0 


Frotn the above table it is clear that ‘rural* and 'agricultural' 
or 'urban* and *non-agricultura!’ art not identical, Tliere arc 
agriculturists living in towns anti amongst self-supporting 
villagers there art non-agriculturists also. 

The 710 lakhs of agriculturists can be further sub-divided 
m follows : 

TABLE 23.1 


Livelihood classes 

Number 

in 

Lakhs 

Percentage 
of all 

agriculturists 

Percentage 
of all ' 
self-support¬ 
ing persons 

Cultivators of land 
wholly or mainly 
owned 

457 

644 

43*8 

Cultivators of land 
wholly or mainly un¬ 
owned . 

88 

12 3 

8*4 

Cultivating labourers 

149 

21-0 

14*3 

Non-cultivating own¬ 
ers of land and other 
agricultural rent re¬ 
ceivers . 

16 

2*3 

1*6 


Total 


710 


100-0 


68 1 








INDIAN STATISTICS 


653 


The 334 lakhs of self-supporting non-agriculturists, too, may 
be divided into four sections to ascertain the manner in which 
they obtain their livelihood. 

TABLE 23,5 



Number 

in 

lakhs 

Percentage 
of all self- 
supporting 
non-agri¬ 
culturists 

Percentage 
of all self- 
supporting 
persons 

Empolyers 

Self-employed persons 

il 

3-3 

11 

other than employers 

165 

19 4 

15 7 

Employees 

Nan-agricultural ren¬ 
tiers. pensioners and 
miscellaneous income 

148 

44 1 

14 2 

receivers 

10 

SO 

00 

Total 

334 

100*0 

31-9 


A study of table 23.4 and 23.5 would show that the 
non-agricultural employees form a distinctly larger proportion 
of non-agriculturists than cultivating labourer* among agri¬ 
culturist*. 

Among the self-supporting persons (agriculturists as well as 
nan-agriculturists) are included those persons who earn their 
income by engaging in some type of economic activity as will 
as those who live wholly on unearned income from property, 
pensions charity, etc. The number of the latter it 26 lakhs* 
he . 16 lakhs among agriculturists and 10 lakhs among non- 
agriculturists. If this is deducted from 1,044 lakhs, the total 
number of sell-supporting persons, we get 1,010 lakhs. Th«a 
figure represents our economically active fopuhtian, This 
census of 1951, as we have already seen, also gives the number 
of our semi-active and economically passive population The 
report also provides the distribution of the economically active 
population into different divisions and tiib-dmasosss of omr 
national economy. 






664 ah ntmoDtJcnoH to statistic*!- methom 

The 1951 census was the first since the transfer of power* 
The fact that so much improvement could be carried out in 
such a short period of three years augurs well for the future. 
At this time when we arc planning for raising the living stan¬ 
dard of our people, this mats of data relating to economic 
characteristics is bound to render a great assistance to our 
planners. 

Mala Feature* of the Census of 1961 

The 1961 census count has been related to the sunrise of 
March l f 1961. The enumeration was spread over a period of 
24 days, i.e. ( from February 10 to March 5. The enumerators 
were to visit all the households during February 10 and the 
sunrise of March 1 for initial count, and the remaining five 
days were to be devoted for following up visits such correc¬ 
tions of the information as was necessitated by any brith or 
death that occurred between the time he visited a household 
and the sunrise of March I, 1961, In areas which get snow¬ 
bound as early as November, enumeration w'as completed 
during September-October, I960. 

Noust-lift. The preparation of house-list is the first step in 
a census operation. In earlier census also house-lists were pre¬ 
pared. But their design was not uniform and only the mini¬ 
mum essential information required for enumeration was 
collected In the present census a unifrom house-list was adopt¬ 
ed for the entire Indian Union Territory. The information 
obtained at the house-listing operation is considerably larger 
and includes the following items ; 

1. Building number (Municipal or Local authority or 
Census number, if any). 

2. Budding number with sub-numbers for each Census 
house. 

3. Purpose for which Census house it used. 

4. Name of establishment or proprietor. 

5. Name of product i s) t repair 01 servicing undertaken. 

6. Average number of persons employed dath 



tnmm 9T*ti*ticr 886 

7* Kind of fuel awed, 

4. M*feri*l of wtl! ( 

9, Material oTrorf. 

(Questions 4, 5, 6 and 7 are Applied if the census house it 
wed as an establishment, workshop or factory,) 

Betides these the house-list it to give information with regard 
to number of Census household* name of the head of the house¬ 
hold and the persons residing in the household. Data relating 
to establishments, and building materials have been obtained 
af the request of Ministry of Commerce and Industry and the 
National Buddings Organisation respectively. The enumera¬ 
tion of person* rwtdtag in a household at the time of house- 
listing operation provided a preliminary estimate of India's 
population. 

The Household Schedule 

The census operation involved the filling-in of two forms, a 
Household Schedule and an Enumeration Slip. The House¬ 
hold Schedule was a new feature of flic )% 1 Census (In the 
1941 and 1951 Census the enumeration slip alone was used ) 
This schedule comprised of two parts. The fusts part (called 
household schedule) besides indicating whether the household 
belongs to Scheduled Caste or Scheduled Tribe recognised as 
such by a state, and also its location in the mohalla, wart! 
or village, as the rase may be. was designed to collect data on 
the thief economic activities of the households, vie , cultiva¬ 
tion and household industry. A* regards cultivation, it 
obtained information about the extent of land actually ruitb 
vated by the household classified as (a) land owned or held 
from Government, and (b) held from private persons or insti¬ 
tutions for payment in money, kind or share. The cultivated 
lard area was further classified according to the local name 
of right of ownership or cultivation. Besides this, information 
was collected about land not-actually cultivated by a house¬ 
hold hut given to a private person for cultivaiion far pay¬ 
ment in kind nr share. TUr data colfeefrd under thfo will 



666 AN INTRODUCTION TO STATISTICAL METHODS 

throw light on the she of holding! and the form of tenure, 
and will be helpful in the formulation of any scheme for the 
re-organisation of agriculture. As regards household industry* 
information was obtained relating to the nature of industry 
and the number of months in the year during which it is con¬ 
ducted. The household industry has been defined as one 
which is Dot on the scale of a registered factory, and which is 
conducted by the head of the household himself and/or mainly 
by members of the household at home or within the village in 
rural areas and only at home in urban areas. 

In addition to this, data have also been collected regarding 
the extent of employment, both of the family members and 
hired workers, in household cultivation and household industry 
separately. The number of family members engaged was 
given separately for males and females. 

Alt this data, when tabulated by suitable class intervals 
of land under cultivation, number of months during which 
household industry is run, input of family and hired labour, 
and when cross classified by information about land culti¬ 
vated and houshold industry, will yield a wealth of useful 
information. The Household Schedule will also provide an 
estimate of the extent of under-employment in cultivation and 
household industry. 

The second part of this schedule known as 'Census Popu¬ 
lation Record’ was intended to give for each individual enu¬ 
merated in the household such essentia! information as name, 
relationship to the head of the household, age, marital status 
and description of work, if working. This record, if corrected 
periodically for births, deaths and migrations, can serve as 
the basts for making estimates of the population until 1971. 

The Enumeration Slip was used to obtain information 
regarding demographic, social and economic characteristics 
of each individual enumerated. It contained the usual 
demographic questions, via., name for identification, rela¬ 
tionship to the head, sex, age, marital status, and birth 
place. A notable ini ovation that was made in 1961 was the 
introduction of two demographic questions to study rural- 



INDIAN UTATIHTJCS 


657 


tartan migration in more dr tail The two new questions 
were: Question 4 (b) whether barn in rural or urban area, 
and Question 4 (c) duration of residence, if born elsewhere. 
These two questions together will provide a wealth of informa¬ 
tion on rural-urban migrations. In earlier census it was 
possible to study inter-district migration within any state as 
also inter-state migration but there was no way of studying 
rural-urban movemnt of the population. Nor was it possible 
to determine the extent of migration during a certain period. 
The quantum of internal migrations determined from the 
census denoted *life time' migration only. But with the intro¬ 
duction of these two new questions, it would be \possible to study 
rural-urban migration for different periods In the context of 
planning and rapid urbanisation during recent years thr utility 
of such data cannot be over-emphasised. 

The questions on social characteristics had the usual census 
pattern, and pertained to nationality, religion, scheduled castes 
or scheduled tribes, literacy and education, mother-tongue 
or other languages known. No information about castes was 
collected, and the cau regarding scheduled castes and ache- 
luted tribes obtained in view of the constitutional 

guarantees giver to these particular communities, Special 
arrangements were also made for obtaining detailed informa¬ 
tion, v»/. scientific qualifications, and conditions of employ¬ 
ment of technical personnel 

The data collected on economic characteristics are perhaps 
the most important and arc intended to serve the current 
requirements. The questions used for t he purpose of obtaining 
the economic data were ; (8) working as a cultivator, (9) work¬ 
ing as Agricultural labourer, (10) working at household in¬ 
dustry, { KVa) if employee, (11) doing work other than inclu¬ 
ded in (8), (9) and (10), (ll-c), class of worker. Thus 
information hat been collected to determine whether an indi¬ 
vidual was working as a cultivator, as a rural labourer at a 
household industry or doing any "other work*. This means 
that the entire population could now bt divided into two 
categories : (>} working, and (ii ) non-work trig, and the old 
f? 



658 AW INTRODUCTION TO STATISTIC A I, METHODS 

classification on the basis of economic independence has been 
given up. This has facilitated, for the first time, the 
inclusion of the family worker, who goes unremunerated in 
cash, in the total working population. For persons working 
in household industry or doing 'other work’ information has 
alto been collected regarding their status It was, for example, 
Inquired from a person working at household industry whether 
he was an employee or not. Persons engaged in ‘other work* 
were classified as ‘employers’, ‘employers’, ‘single workers’, or 
‘family workers’* 

Further, in t he case of each individual working at‘house¬ 
hold industry or ‘other industry’ information about the "nature 
of the work done’ and the nature of the industry was obtained. 
This wi 1 facilitate a two-fold classification of all workers 
(a) in their individual occupations, and (b) in the industries in 
which they serve. Separate tables on occupation and indus¬ 
try when cross-classified by four age groups (corresponding 
roughly to 20-year periods), educational qualifications, sex 
and class of worker, will yield a wealth of revealing information 
about our working force. 

Partly to make sure that a person who does not declare 
himself as worker is not actively engaged in any economic 
activity, and partly to ascertain the activity of the non- 
worktng population, a question (12), viz., activity of a 
non-working person, was included in the slip. A non-work¬ 
ing person was thus listed in cither ol the following cate¬ 
gories : 

(i) Housewives, 

(ii) Students, 

(til) Infants, 

(tv) Retired persons, receivers of rent or interest, 

(*) 

(vi) Inmates of penal, mental and charitable institu* 
ikms, 

(vii) Persons seeking employment for the first time, 

(vait) Persons employed before, now unemployed, and 

seeking work. 



INDIAN STATISTICS 659 

Fi»db|t of tfft Cental 

Some important finding* of 1961 census are given below : 

Total Population. The total population enumerated at the 
1961 census is 439,235,082 (439 million approximately). The 
division of this population amongst male* and females is as 
follows : 

Male* 226,293,620 (app. 226 million) 

Female* 212,941,462 (app. 213 million) 

The 1961 census covered the entire country with the exception* 
of portions of Jammu and Kashmir currently under the 
occupation of Pakistan and China. 

Six Patio* The ad-India sex ratio, i,r. t Hie number of 
females per 1,000 males was 940 according to 1 % 1 cm usd The 
sea ratio is appreciably lower north of latitude 22* than south 
of it. 1 his is reflected also in the sex ratio of urban areas in 
the north and south. While very few town* (not certainly 
cities) in the north have their sex ratios anywhere near par, 
there are comparatively few towns or even cities in the south, 
especially in the State* of Andhra Pradesh, Mysore, Kerala 
and Madras where the sex ratio drops much below 900 female* 
per 1,000 male*. This is a matter of much sociological interest 
to urban planning in India, 

Ocritpaiioml C(asiijitaii<*n. five 1961 census reveals that 
72*8 per cent of the total population is working in the primary 
sector, 1.17 per cent in the secondary sector and the remaining 
15 3 per cent in tertiary sector. 


Grmlh of Papulation. Population of India at each census 
i 1901-61) showing decennial per cent variation ts as follows : 


War 

Population 

becemiiaf per cent 
variation 

1901 

236.281.243 


191) 

252,122,410 

+ 5-73 

1921 

251.332,261 

- 31 

1931 

279,015,498 

+ 1101 

1941 

318,701,012 

- 14 22 

1951 

361,129,622 

4 13 31 

196) 

439.235.082 

421 50 


* Mk *, rtf 4 




660 ah introduction to statistical methods 

The table reveali that during a period of ten year* (1951 to 
1961) there has been an increase of 21*5 per cent in the popu¬ 
lation the average rate of growth being 2*15 per cent. 

UUrasy The test of literacy for the 1961 was satisfied if a 
person above the age of four could with understanding both 
read and write, The 1961 census reveals that the number of 
literates per 1,000 persons (of all age group) was 539 among 
males and 128 among females, the average for the group being 
237. In terms of percentages 23*7% of the India's population 
was literate cm March I, 1961 and the rest, i.c, 66*3% 
was illiterate (i.c.* could not both read and write). Over the 
period 1951 to 1961 literacy had increased at an average rate of 
*7% pry ear for the population as a whole—-9% for males atwl 
5% for females The literacy rate is the maximum in Kerala 
being as high as 46 2% ami the lowest in Sikkim. Other States 
of very low literacy ratio are Himachal Pradesh, Madhya 
Pradesh and Rajasthan. 

Urban and Rural Population, Out of the tola! population of 
439 million 82 2% constitute the rural population and the rest, 
i.c,, 17*8% urban population. As in previous censuses, in 
1961 census also all Municipal Boards, Cantonment Boards and 
Notified areas were deemed as urban areas. In addition* all 
other places (a) having population in excess of 5*000, (b) where 
three-fourth of the population depnded on non-agricultiiral 
means of Hvlihood , (c) where the density of population exceeded 
1*000 persons per square mile were also treated as urban areas. 

Density of Population, The density of population measures 
the number of persons per square mile The all-India density 
of population according to 1961 census has gone up from 316 
in I9f I to 384. The 1961 census records that areas of varying 
demit es freely cur across the boundaries of States* and gene¬ 
rally conform to natural features. For example, the region 
of highest density, except that of Kerala is still the Indus- 
Jamut a-Ganga Doabs and the West Bengal basin. Stemming 
from * if narrow strip of Gurdaspur, Amritsar* Jullundhur and 
Lu4h f tna in the North West, the broad belt of Uttar Pradesh 



INDIAN STATISTICS 


661 


merges with North Bihar and finally with West Bengal density 
in the entire region being seldom below 500 per square mile and 
often above 1,000 in spacious bands and astride the river. 

Population of Jammu and Kashmir* The population of Jammtf 
and Kashmir wa* for the first time included in the census of 
1%L The census reveals that the total population of Jammu 
and Kashmir in I%1 was 35,83,585 of which 19,02,902 were 
males and 16,80,683 females. 

Labour Statistic* 

It is only in recent times that the question of having 
accurate and complete statistical information on labour in 
various countries has been given proper attention. This has 
been due partly to the powerful labour movement and partly 
to the lead given in this connection bv the International 
Labour Organisation. 

Labour Statistics are usually collected by the following . 
(0 The State'—Statistical data come as a result of the labour 
laws. (t») The employers and employees. They collect 
statistics in order to make their propaganda campaign impres¬ 
sive and effective and also to plead their respective cases 
before the Government and the public. They collect statistical 
data alio for organising welfare activities, (in) The research 
students in the Universities who make a balanced study of 
the complicated problem* of labour. 

In India labour statistics are largely the bye-products of 
the administration of the various labour laws like the Factories 
Act, the Payment of Wages Act, Minimum Wages Act, etc. 
The employers and employees in our country do not generally 
collect such data. 

The Labour Bureau, Ministry of Labour, Government of 
India, which was set up in the year 1946 i* responsible for the 
collection and publication of statistics relating to labour. 

Statistical data available in our country can be grouped 
under the following heads : 

1, Statistics of Employment and Unemployment including 
statistics of absenteeism and labour turnover ; 



662 AN INTRODUCTION TO STATISTICAL METHODS 

2- Trade Union Statistics ; 

3- Statistics of Industrial Injuries ; 

4, Statistics of Industrial Disputes; and 

5. Statistics of Wages. 

i* Employ mem 

Overall employment data for the country were collected 
under the decennial population census conducted in 1951 
and 1 % 1. Brief information with regard to these has been 
given under the. section of population statistics discussed 
elsewhere. Here it u intended to discuss serial statistics of 
employment collected by the method of establishment reporting 
for the organised sectors of economy 

Empioymnl in Factorui Statistics of employment in 
factories are collected under the factories Act, 1948. The 
information is compiled from the annual returns submitted by 
the factories to the Chief Inspector of factories of their 
respective states and with regard to factories not submit* 
ting returns from the estimates of the Chief Inspector of 
Factories basted on inspection of reports. The Chief Inspector 
of Factories furnishes to the Labour Bureau returns showing 
industry wise classification of the employed in his State, The 
Labour Bureau compiles the information on an all-India basts 
and publishes it in the Indian Labour Year Book. The 
annual returns tiled under the Factories Act give information 
relating to fi} Average number of workers employed (men, 
women, children separately; (it) days worked; (in) number of 
hours worked; and (iv) intervals. 

Employment in Mims Annual statistics relating to average 
number of persons (including wage earners, salaried employees, 
foremen and apprentices) employed in and about mines and 
quarries, covered by the Mines Act, arc collected by the Chief 
Inspector of Mines under the statutory provision and pub¬ 
lished in hb annual reports on the working of the Mines Act. 
From 1951 these statistics relate to the whole of Indian Union. 
The total number of persons employed are classified according 
to 4 Minerals Produced*. The total employed in each type of 



INDIA!* STATISTICS 


663 

mineral production) is classified according to state. There is a 
further classification as * underground*, 'open-working'* and 
"surface* worker#. The last two categor ies arc classified accord¬ 
ing to sex. A monthly scries of average daily number of 
workers employed in coat mines is also published by the Chief 
Inspector of Mines. 

Emphymtnt in Plantations. Annual statistics relating to 
employment in plantations are compiled and published by the 
Directorate of Economics and Statistics. Ministry of Food and 
Agriculture, in its publications entitled 'The Indian Tea 
Statistics', ‘The Indian Coffee Statistics' and ‘The Indian 
Rubber Statistics’, The returns from individual estates are 
collected through the State Government agencies. The figure 
of average daily employment is obtained by dividing total 
attendances during a year by a standard figure of 300 wot king 
days. Information in regard in the extent of employment of 
women and children ttt plantations is available only for tea 
estates in Assam from the figures published by the Controller 
of Emigrant Labour in his annual reports. 

Besides this, information with regard to employ ment in 
’Posts and Telegraphs', ‘Railways’, "State Motor Transport*, 
'Seamen*, ’Municipalities*, 'Building and Construction* 
(CFVVD) and 'Central Government Establishments 1 is also 
available in the Indian Labour Year Book. 

The above information shows that employment data are 
available for a small sector only. In regard to the major 
sector which account* for the bulk of employment in India, 
figures of employment are not collected—e.g., there is no 
systematic information available with regard to employment 
in agriculture, cottage industries, commerce and transport, 
storage and communication, construction services, etc. As 
regards these, data collected at the time of decennial census 
is the only source of information. Estimates of employment 
in weaving are, however, available in the ‘Report of the Fact 
Tinding Committee (Handloom and Mills, 1947)’. Scattered 
data are also available in respect of employment for some 



664 AH INTRODUCTION TO STATISTICAL METHODS 

cottage industries in certain regions as given by local field 
enquiries or surveys. 

Statistic# of Unemployment 

Statistics of unemployment are practically non-existent in 
our country. There is no statutory obligation for registration 
of unemployed. There is no unemployment insurance and the 
trade unions also which could provide another source for 
unemployment statistics have not taken any initiative in this 
regard. In countries where trade unions generally take the 
responsibility of finding employment for their members or pay 
unemployment benefits, information with regard to the extent 
of unemployment can be obtained from them also. 

In the absence of anything better the series showing the 
number of applicants on Live Registers of Employment Ex¬ 
changes at the end ot each month is taken to be an indicator 
of unemployment situation in die country. Hut there are the 
following defects in this series : 

(a) Registration is purely voluntary and is not subject to 
any incentive except that persons desirous of jobs (other than 
highly technical, professional, scientific and administrative in 
nature) under the Central Government cannot be considered 
unless they are registered and are recommended by an Employ* 
ment Exchange. 

(b) The rural population is generally under-represented 
because of the distance of the Employment Exchange from 
their places of residence. 

{c) Persons seeking alternative employment while on some 
job are also allowed to register themselves at Employment 
Exchanges 

{d) A number of registered persons do not inform the 
Exchanges on getting a job through their own efforts. 

Statistics of Absenteeism 

Absenteeism is statistically measured by the percentage of 
man-shifts lost due to absence to the corresponding total man* 
shifts scheduled to work. Such statistics for selected industries 



mm AT* STATISTICS* 


665 

at important centra are maintained and published by the 
Bureau k some of the State Governments and the office of the 
Inspector of Mines. The Employers' Association of Northern 
India also complies statistics of absenteeism in regard to certain 
industries at Kanpur. The methods followed by the different 
agencies, however, are not uniform. T hese statistics, except 
those relating to absenteeism in Mines, are based on voluntary 
returns furnished by selected large concerns. Statistics of 
absenteeism are available for Cotton, Woollen, Cotton Woollen 
and Silk Industries, Engineering, Iron and Steel Industry, 
Cement and Match Factories and Leather Industry. Informa* 
lion in regard to percentage of absenteeism classified by causes 
is also available. 

Monthly statistics of absenteeism covering all workers in 
coal mines are compiled and published by the Chief Inspector 
of Mines. 

Statistics of Labour Turnover 

Labour turnover measures thr extent to which the old 
employees leave and new employees enter into services of an 
organisation in a given period. There are, therefore, two 
aspects of the problem, viz., the proportion of workers who go 
out of employment, and the proportion of workers who entar 
into employment in a given period of time. A monthly series of 
statistics of labour turnover 1* available in respect of the Cotton 
Industry in Bombay from 1950 onwards This is compiled by 
the Government of Maharashtra. 

9* Trade Union Statistic* 

Trade Union Statistics are collected as a result of the 
administration of the Indian Track Unions Act, 1926. Under 
this Act registered trade unions are required to submit annual 
returns winch are compiled by State Governments and supplied 
to the Government of India. The data thus received from the 
states arc published annually by the Government of India in 
the form of a brochure. It is not compulsory for all trade 
unions ioget themselves registered under the Act, and a number 



666 AH INTRODUCTION TO STATISTICAL METHODS 

of trade unions function without being registered. The data 
with regard to such unions are not available. Besides, some of 
the registered trade unions do not submit returns. 

Statistics arc available about the number of trade unions 
registered under the Act, the number of unions that have 
furnished the returns, membership of the unions (classified 
according to sex). Information is also available about ; 

(i) Number and membership of trade unions classified 
according to States ; 

(ii) Number and membership of trade unions according to 
Industries and sex ; and 

(lii) Distribution of trade unions according to size 
(membership). 

In the statutory returns that are submitted by the registered 
trade unions information is given about their finances also. 
Statistical data are available in regard to the income of the 
trade unions and the amount of income obtained from different 
sources ; to also figures are available of expenditure on different 
items. 

$. Statistics of Industrial Injuries 

Statistics of injuries are based on the number of persons 
(killed or disabled) involved in industrial or work accidents. 
For statistical purposes the number of injuries is the number of 
persons receiving in juries as a result of accidents. Such statistics 
are collected under the provisions of the Factories Act, Mines 
Act, Indian Railways Act and (he Indian Dock Labourers’ Act. 
There arc two important measures of injuries, via., 

(i) Frequency rate, 

(ii) Severity rate. 

The standard methods of calculating these ate : 

(a) The frequency rate should, if possible, be calculated 
by dividing the number of injuries (multiplied by 
1,000,000) by the number of hours of working time of 
«U persons covered. 

(b) The severity rate should be calculated by dividing the 



INDIAN STATISTICS 


m 

number of working days loti {multiplied by 1*000) by 
the number of hour* of working time of all person* 
covered. 

4. Statistic* of industrial Disputes 

Statistics of industrial disputes are compiled by the Labour 
Bureau on the basis of the reports received from the state labour 
departments. All-India statistics arc published every month in 
the Indian Labour Gazette. Such statistics are collected through 
voluntary submission of returns by the employers but official 
agencies are also utilised to trace the occurrence of dispute* and 
to collect fuller details relating to them These relate to indus¬ 
trial disputes ' both strikes and lockouts) resulting in work stop¬ 
pages involving ten or more workers in all sectors of employ- 
men* including mines, trade, transport, plantations, etc. The 
data available refer to (ij Number of disputes* (ii) Number of 
workers involved* and (iji) Number of man-day* lost, All this 
information is available with regard to each State and each 
industr\ The disputes have also been classified according to 
causes* e. g„, Wages and Allowances, Bonus. Personnel. Leave 
and hours of work, etc. Disputes are classified according to 
duration alto. 

5. Staliitics of Wage* 

Statistical data in regard to Wage* emerge mainly from two 
sources* via., 

(i) The report* of the various committee* or commissions 
or the reports of the wage-census conducted by diffe* 
rent State Governments at differnt times. 

(ii) T he administration of the various labour laws, c. g., the 
fact otic* Act* the Payment of Wages Act* etc. 

In the matter of collecting wage-statistics the Government 
of Bombay took a lead by conducting a comprehensive wage 
census covering ail manufacturing industries in 1934. Similar 
surveys have been conducted in Bihar* U. P. and Madras* 
The Labour Bureau also conducts from time to time wage 



668 AN INTRODUCTION TO STATISTICAL METHODS 

enquiries. The report of the Labour Investigation Committee 
(known as kcge Committee) provides statistics of wages with 
regard to certain industrial centres. 

Besides these ad hoc census and reports of the Commissions 
valuable statistical information regarding wages is available »r» 
the Annual Report of the Working of Factories Act * and the 
Annual Report of the Chief ) nspector of Mints. The report 
on the annua) census of manufactures also provides statistics of 
wages in different industries covered by the census. Informa¬ 
tion on earnings of factory workers emerges also front the 
returns under the Payment of Wages Act. This Act applies 
only to persons employed m factories including Railways who 
receive wages and salaries below Rs, 200 per mrnium Accord¬ 
ing to this Act the term ‘wages' means all remuneration capable 
of being expressed in terms of money. It does not include the 
value of any house accommodation, supply of light, water and 
other things, employers’ contribution to Provident and Pension 
Funds, etc. Recently the provisions of the Act have been exten¬ 
ded to other sectors also, r.g , mines, plantations, tramways, etc. 
rite returns submitted under this Act contain figures in regard 
to average daily employment and total wages paid during the 
year. 

For obtaining statistics of Agricultural Wages, on a uniform 
and regular basis, the Directorate of Economics and Statistics 
prepared a scheme in 1950 envisaging the collection of wages 
data regarding various types of agricultural labour on a month¬ 
ly basis from each district, consolidation of district returns 
at state headquarters, and their compilation on an all-India 
basis by the Directorate. Dus information is published in the 
"Indian Agricultural Wages Statistics*. Betides this the Agri¬ 
cultural Labour Enquiry Committee also collected wage statis¬ 
tics in different parts of the country. 

Labour Bureau lade* of Earning* of Factory Workers 

With a view to make a study of the trends of the earnings 
of the industrial workers the Labour Bureau started constructing 



im>!AK STATISTICS 669 

an All-India Index of Earning* of Factory Worker*. Thi* wai 
an annual index and had the following three part* : 

(a) A State Index of Earnings, 

(b) An Industry Index, and 

(c) An All-India Index. 

The base year was 1939 and the index numbers are avail¬ 
able from the year 1944 up to 1951 though it was published 
for the first time in 1953. The data regarding earnings used 
for the purposes of the construction of this Index were those 
that were obtained by the Latxuir Bureau under the payment 
of Wages Act , In order to facilitate comparison of the index 
of earnings and the index of cost of Jiving n new series of 
earnings index was computed by shifting the base of the 
Earnings Index to the year 1944 {the year which is used as the 
base for constructing Cost of Living Index). A series of tea! 
earnings was compiled by deflating the Earnings Index (1944 
base for changes in cost of living. 

Agricultural Statistics 

'Agricultural statistics' may be broadly defined as the 
aggregate of numerical information bearing on the different 
fields of agriculture and its economy. It covers a very wide 
field which may be broadly classified under three heads ; 

(a) Basic agricultural statistic* which provide the basic infor* 
matron regarding the agricultural structure and resource* of a 
country. These include the number of agricultural holdings, 
their sixe. form of tenure, fragmentation, land use, farm 
management, machinery and fertilisers used, etc* 

(t>) Agricultural statistics proper include statistics of crop 
acreages and production and statistics of livestock and their 
product*. 

(c) Agricultural statistics in thi wide sense include statistics of 
agricultural stock, statistics of cost of production and prices of 
crops, statistic* of trade in agricultural products* statistic* of 
farm wages, income and expenses, rural indebtedness* statistics 
of fisheries* forestry and forest product*, etc 

Statistics under (a) provide basic informal ton regarding the 



670 AW INTRODUCTION TO STATISTICAL METHODS 

agriculture structure and resources of a country and those 
under (h) and (c) are required by the administration in talcing 
decisions on matters of agricultural and food policies. 

If the agricultural statist ics series of a country have to be 
ofanybelpm the formulation of agricultural policies* they 
must satisfy some basic requirements, 

(i) Utility While collecting agricultural statistics, there 
must always be in view the definite, purpose for which they are 
being collected. Any plans for collection of agricultural statis¬ 
tics should he preceded by suitable planning by qualified 
statisticians who understand their use in the formulation of 
agricultural development plans and administrative action and 
their use to the general public and the trade. There is no 
greater confession of weakness in statistics than for the executive 
lo say “very interesting but w'hat shall we do about ii“ and for 
the statistician to answer ‘'1 do not know'*. 

(ii) Sigmjufimf The data collected must be w idely signi¬ 
ficant. F A O. has suggested that the significance of agricultural 
data increases if instead of taking held as the unit, the holding 
or the farm is taken as the unit, lire a use then it gives a clearer 
picture of the agricultural structure and economy of a country. 

(iii) Reliability. The collected statistics, if they are to be 
of use for any action, must not be mere guesses. They must 
be based on scientific and precise methods so that they ma> be 
fairly reliable. 

(tv) Adrian «/ Ctwragf. There must be strong emphasis 
on the completeness of geographical coverage of the statistics 
reported. The extent of coverage is sometimes limited by tbe 
administrative resources £ a country. While stating agri¬ 
cultural statistics, the extent of coverage should also be specified 
in percentage. 

(v) Timlimss . The usefulness of agriculture statistics is to 
a very great extent related to their availability in time* Delay 
in the publication of collected statistics renders them merely of 
historical importance. The timeliness of agricultural statistics 
Has been particularly emphasised in view of the various crops 
forecasts required at quite early dates. 



mOlAN STATISTICS 


671 


(vi) kitmatiaml Comparability, Today when the countries 
of the world are so much interdependent regarding their 
agricultural requirement* and when for their mutual benefit 
they have art up a world organisation called the F.A.O. it in 
very' necessary that the individual countries should ute standar¬ 
dised definitions and terminology of agricultural statistics in 
order to render the information internationally comparable. 

Methods of collecting agricultural statistics vary considerably in 
various countries according to the degree of their economic 
development In more developed countries the unit of enumera¬ 
tion t* the farms The method of enumeration is by interviewing 
the farmer and mailed questionnaire supplemented sometimes 
by spot inspection. The census are annual in some countries 
while periodic in others. The annual census suffer from 
incompleteness, Countries such as the U.$.A., Canada and 
Finland conduct census only periodically and improve on their 
basis (he precision of annual estimates made through sample 
surveys. In under-developed countries there is hardly any 
attempt at systematic enumeration and the agricultural statistic* 
are more or less guesses. 

Production statistics are usually collected through the 
agencies of crop reporters who send ilieir reports tr> the centra! 
office. These reports are expressed either in absolute term or 
as [percentages of the given normal yield. These reports are 
modified on the basis of information procured by independent 
enquiries initiated by the staff of the central office. 

The organisation for collecting agricultural statistics may 
be centralised or decentralised. The centralised statistical 
system is one in which the function of collection and puhli- 
cation of both agricultural and non-agricultural statistics are 
veiled in the same general statistical office. In a decentralised 
statistical system, on the other hand, the compilation and 
publication of agricultural statistics is done by a special slat it-, 
tiail office usually in the Ministry of Agriculture. A centralised 
system of organisation exist* m Canada while a decentralised 
system exists in the U.S.A., the U.K. and India. Tile 



672 AH INTRODUCTION TO STATISTICAL METHODS 

desirability or otherwise of centralised or decentralised system 
depends upon the state of development, the sire of the country 
and form of Government. For an under-developed country, 
centralised system may be instituted to start with* because it 
results in saving in statistical equipment, administrative cost, 
etc. Also there is an advantage in the centralised system, in 
the use of common classification system by different agencies 
so that different series can be integrated to produce a consistent 
picture of the country’s economy. There is, however, a great 
demerit of the centralised system in that the statisticians under 
this system lose contact with their subject-matter (fields) which 
seriously obstructs creative research in applied statistics. A 
decentralised system having a co-ordinating statistical authority 
to co-ordinate the activities of the statistical office under diffe¬ 
rent statisticians, would be the best organisation. The co¬ 
ordinating authority would avoid duplication of effort and lay 
down standards for adoption by different statistical offices. 

The statistical services and organisations vary widely in 
different countries—from being most highly developed in 
countries like the U.S A. and the U.K., fairly developed in 
countries such as Argentina and Mexico, poorly developed in 
countries as Syria and Cambodia, to no statistical services and 
organisation in rountries like Afghanistan and Saudi Arabia. 
The work of collection and dissemination of agricultural statis¬ 
tics for the world as a whole is being done by Food and 
Agricultural Organisation with the help of individual countries 
who are members of this organisation. The functions of F.A.O. 
include the assemblage of agricultural data in various countries, 
their analysis and critical examination in the light of statis¬ 
tical services which collect them, the analysis and publication 
of all such data collected and available and the attempts to 
improve the existing methods of collection in different countries 
and to organise a uniform and definite method of collection. 
F. A.O. covert agricultural product ion in its various aspects such 
as forestry, fisheries, nutritional standards of the various peoples 
of the world, international trends in agriculture, data on agri- 



mm An ntAtnnm 


473 

cultural population. agricultural income, credit, loam and a 
vast variety of related subjects. 

The F.AXX undertakes the publication os' :i number of 
important journal*. The most important of the* « die 'V ear 
Book of Agriculture* published annually* Other two imporunt 
journal* published b\ the I.AO are : Qiwterh Kconomk 
Review of hood anti Agrhvu it urr, and MonT.blv bulletin of 
A g i i c u 1t ura! Statist to*.. 

Agricultural $t*tistki in India 

In India, all work relating to the tcunpikumn and disarm >. 
nation of agricultural statistic* is carried out V--- *he Directorate 
of Economics and Statistics in the Central Mmistrv of Food 
and Agriculture. The work relating to jcseai-h in Mitt (Mints 
sampling and improved methodology in the field of agricultural 
statistics b carried out by the Statistics Branch of ibr Indian 
Council of Agrirultuiai Research under tms Minnuv It 
work* in close co-operation with the direct'u a: e. 

statistic* of Land Utilisation 

Detailed sia'ittics of land ulddatun* whit- mainly gn.e 
areas of hind put to difirtent uses, air ,*;c m - ■.■omiouch 
ivftdablr since RUH ( although theii geographical •.•,.• ;..igr and 
scope has been gradualU expanding At \*t* '} r. .Hm-m*. 

lion U available for near!) 117 per cent of tie toed .... *; the 

I net tan Union. These ate at present published in uV , nual 
Indian Agricultural Statistics 'V f <*!.*. ! and II issued f*y the 
Directorate of Economics and Statistics, Ministry of Food and 
Agriculture, Govcnwent of India. 

Up to 1949-50 the classification of land mr for pur ymt% 
of land utilisation statistic* consisted of *hr follow ms? five bread 
heads : 

(i) Area under forests, 

(ft) Area not available for cultivation, 

(iii) Uncultivated land excluding current fallows, 

(iv) Area under current fallows, and 

(v) Net area sown. 

4.3 



674 AN INTRODUCTION TO STATISTICAL METHODS 

This classification did not provide a clear picture of the 
actual area under differ ru categories of land-use, so very 
necessary for agricultural planning. Even the data with regard 
to the different state* were not comparable due to lack of uni¬ 
formity in methods of classification and in definitions of the 
different classes To remove these defects the existing classifi¬ 
cation has been improved and standard definitions of the vari¬ 
ous classes to be followed by all States have been laid down. 

The new classification is as follows ; 

(i) Forest*. These include at! actually forested ateas or 
lands classed or administered as forests under any legal enact¬ 
ment dealing with. forests whether - state-ow ned or private. If 
any portion of such land is not actually wooded but put to 
some agricultural use, that portion is included under the 
appropriate heading of cultivated or uncultivated land. 

(it) Land put to non-agricultural uses, This stands for all 
lands occupied by buildings, roads and railways or under water, 
and other lands put to uses other than agricultural. 

(iii) Barren and unculturable land. This includes all barren 
and unculturable land like mountains, deserts, etc. 

(iv) Permanent pastures and other grazing lands. This includes 
all grazing land whether permanent or not. 

(v) Miscellaneous tret crops and groves not included in the net 
area sown This includes nil cultivable land not included under 
net area town but is put to some agricultural use, viz., land 
under fuel, bamboo bushes, etc. 

(vi) CuliurabU Waste. This covert all land available for 
cultivation but not taken up for cultivation or abandoned after 
a few years for one reason or other. 

Vi») Gwrt'it Fallows. The class comprises cropped areas 
which are kept fallow during the current year. 

(viit• Other Fallow Land This includes all lands which 
were taken up for cultivation but are temporarily out of culti- 
v *iion for a period of not kss than one year and not more than 
ft /e years. 

(ia) Met area sown. This consists of net area sown with 
crops .rod orchards. 



INDIAN STATISTICS 


675 


Crop Acrttfi St*titties 

In arcs* with temporary settlement the crop acreage figures 
are collected through field inspect tom by the local agenu of 
governments called 'Patwaris*, there bring one Patwari in 
every village or in a group of 3 to 5 villages. The Pat war is 
work under the supervision and control of the Land Records 
Department of each state, Apparent lv the acreage figure* 
reported by the Par war is seem to be faith accurate because 
firstly, the) are collected from records which are expected to i>e 
kept up-to-date by actual inspection of all plots and secondly, 
they are subject to inspection by superior revenue officials, 
A spot check of Pat war i records in a number of selected areas 
of U. I\, Vindhya Pradesh, Punjab and Pepiu was organised 
in ihe spring crop season of 1919*50 at thr instance of the 
Ministry of Finance. A number of experienced investigators 
accompanied by Patwaris were sent round to compare what 
they actually found on the land with (hr corresponding entry 
in the Patwari records. The absolute discrepancy between site 
acreage figures recorded by the Patwari and that reported by 
the checkers was found to be below 60%. A more or lew 
similar result was reached by another spot check that was 
carried out in the reports of Bihar, 1/ P., Punjab, Madhya 
Pradesh. Pepin and Madras during the spring crop season of 
1950-51. 

Chief Catties of Inaccuracies 

The method of collecting area figures through Patwari- 
Kamwgo estimates gives rise to inaccuracies for the following 
reason* : 

(a) The Panvaris do not have the necessary education and 
are conservative. Moreover, they have a heavy burden to bear. 
They art the only officials in the villages and have to under* 
take all the work of their village connected with various depart* 
menu in the government Tints they get very little time to 
devote to the proper compilation of agricultural statistics. The 
situation could, however, he improved if they art given ditto* 



676 AH INTRODUCTION TO STATISTICAL METHODS 


tion* 10 give priority to the work of collecting agricultural 
sot titties. 

(b) Another cause of the inaccuracy of Patwari records is 
the deliberate bias of the Patwaris which is more or Jess 
incurable, because very often Patwaris are interested in under¬ 
estimating or over-estimating the crop for fear of procurement 
of foodgraim or to advocate diversion of acreage from food 
crop* to non-food crops, or to influence the assessment of land 
revenue. 

(c) Patwari records could be improved if they arc sub¬ 
jected to proper supervision by higher officials. Although 
apparently the work of the Patwaris is expected to be supervised 
by Tehsiklars or Kanungoes, but in the midst of their adminis¬ 
trative duties, they also complain of lack of sufficient time to 
attend to the supervision of agricultural statistics. 

(d) A great amount of inaccuracy is lent to the compilation 
of agricultural statistics oy the lack of uniformity in definitions 
in differm parts with Patwari system of reporting. For 
Instance, it is uncertain whether ‘acreage under crops* means 
the area actually sown or the area successfully cropped. The 
government announcements on this point have always remained 
elusive. 

(e) The estimation of acreage under mixed crops is a 
source of great difficulty and inaccuracy . A formula that was 
evolved in 1895 to separate the areas of mixed crops in various 
provinces, still continues to be used. In view of the great 
changes in the composition of crops that must have taken place 
since then, a check on the ratios of the old formula was 
suggested. Another suggestion was that quantitative methods 
of crop cutting experiments should be followed. Neither of the 
two suggestions has been approved by the government on 
grounds of cost involved. 

(f) Another factor responsible for inaccuracy is the inclu¬ 
sion of held tidges in the measurement though they are neither 
sown nor cropped. In order that the estimate be accurate 
some allowance must Ik made for the area occupied by the 
ridges. 



1ft OUl* STATISTICS 


m 

(g) Lastly, there is every likelihood of mistaken in compila« 
lion. From Patwari crop books the totals are sent to tchsi! 
headquarters* then the totals of figure* of all the village* of the 
tehsilf arc sent to district headquarters and so on till they reach 
the centre. Usually no steps are taken to verify the calrola- 
tiom of lower stages. Mistakes occur even in totalling of a 
large number of figures and the magnitude of such mistake* 
may at times be quite large. 

But in spite of the various inaccuracies enumerated above 
there t$ a strong case for the continuation of data collection 
through the Pat warn. For large administrative areas such as 
distr icts or states, figures cars, of course, be obtained by random 
sample surveys. But for individual villages for which figures 
are required for administrative, purposes, surveys-will be dis* 
proportionately costly. 

For improving the quality of the statistical data collected by 
the Paiwarts, various measures have been suggested. The state 
governments have been advised to reduce the jurisdiction of the 
Patwaris and some of them have already started doing it. Step* 
have also been taken to train the Patwaris in the technique of 
data collection. Another factor which will contribute to the 
reliability and timeliness of agricultural statistics is the provi¬ 
sion of adequate supervision. The state governments have been 
advised to appoint district statistical officers for this purpose, ft 
is also suggested that to make the supervision really effective it 
should be rationalised. This mean* that the supervisory staff 
should pay surprise visit* to plots selected in a random manner 
to verify the accuracy. This method will have a two-fold 
advantage, via., it puts the Patwari in the psychological frame 
of mind to report reliable data and it will be possible to deve¬ 
lop correction factor* for adjusting the figures as reported by 
the Patwaris. 

In the area with permanent settlement no detailed crop 
records are required to be kept for revenue purposes, so that 
most of the estimates are mere guesses made by village head¬ 
men. But when various committee* criticised the situation, 
different steps for data collection were taken by different States. 



678 AP* INTRODUCTION TO STATISTICAL METHODS 

In West Bengal acreage figures arc obtained by random sample 
surveys. Reporting based on complete enumeration has been 
adopted in Bibar since IfMft. 1 n all other territories statistic* 
continue to be obtained only bv subjective estimates of ad** 
mimslrativc oilkiab. 

Bur such d st^te *>i''aHairs '.yin nut be allowed to continue for 
long* and the. gaps in the statistics have to be idled itrnnedi- 
arrive It has to be tlrtuded whether the rnefhew) of complete 
enumeration or random sample surveys has to be adopted.. 
Results'TcotnpiVte' numeration are often not as reliable as 
they are a mimed and the cost involved is matsv times more 
than in a random sample stitvev. The V.S. Sub-Coin in idee 
ort statist ic ad sampling, thesefoie. suggests that it the ad minis- 
trarivr requirements art nor y> preying as tosei up an elaborate 
system of plot to pirn rnumriatvon, random sampling methods 
should be immediately adopted to obtain reliable statistics. 
There will, of course, be difficulty in the case of certain unsui> 
veyed areas having *v"> village maps with boundaries of indivi¬ 
dual plots shown in them. But suitable frame can be construc¬ 
ted with the help of topographical or air survey maps, list of 
bolding*, etc., which will usually be available. Even complete 
mapping mav be done for the sample villages. It should be 
noted that plot to pkf rttimieratmn in these area? will involve 
only greater ddfKidhrs and higher rusts 

AftiMni o/^btaintfig pr-.?dtuUon 'Today when trade 

in agricultural products has asnitned an international character, 
the necessity of collecting reliable: production statistics hardly 
need* any emphasis. V very merchant is placed at a disadvan¬ 
tage unless he knows with reasonable accuracy the si/e of the 
crop with which he has* to deal Undue speculation in the 
price* of *igrh ultuial crops run only be checked if there are 
adequate production statistics, Then a sound rationing system, 
can only be built up on the hast* of accurate statistics* 

It should he noted m this connection that actual production 
statistics shall only be of historical significance. What is of 
value to the traders is the estimates of yield. They need a 
forecast of the probable size of the crop in advance of the liar* 



INDIAN STATISTICS 


m 

vest, hi India there are generally three forecast* lor each crop. 
The first forecast gives the area under crops and the harvests 
expected. The second includes or excludes areas newly town 
or areas thrown out o r cultivation and on that basis indicates 
the character of the crop and the percentage yield. The final 
forecast reveals the total area sown and the final estimates of 
yield and the price*, exports, etc., of crops Each of these diffe¬ 
rent phase* of forecasts appears after an interval of three months 
and includes all the important crops of India 

Method of uefr precasting climating yields. The official 
practice of obtaining vie Id estimates which is still followed to 
many states is based on the concepts of normal yield and the 
condition factor. According to this practice : 

Yield Vea * normal yield x condition factor or seasonal 
factor. " 

The problems with regards to area statistic* have already 
been considered above. The remaining two components - the 
normal yield and the condition factor— are dicaused below. 

,Wma/ yield factor. Except in the Punjab, it u the practice 
to give the estimated yield as a perceningr of the mrmalyield 
expressed usually in the form of an anna-condition factor. There 
is a great deal of confusion as regards the meaning and impli¬ 
cations of the term 'normal'. At one place it is understood as 
u the yield which past experience has shown to foe the most 
generally recurring crop in 3 scries of years; the typical crop of 
the local area, the nop which the cultivator has a right (as it 
were) to expect and with which he is or should be content, 
while if he get* more, he has reason to rejoice and if less he has 
reason to complain.** Briefly it is the "average out-turn on 
average soil in a year of average character m deduced from a 
consideration of the information obtained on experiment* 
made during the year under review " ‘The confusion arise* 
from creating 'normal' as an average, In fact, the two terms 
have different meanings and should not be used as co-extcmlve. 
Average generally signifies a mean 0 ! the past figures. 
The word 'normal* « on the other hand, is used to indicate 




680 AH INTHODVCTIOIV TO STATISTICAL METHODS 

the crop which ordinarily a cultivator expect*. A* described 
in Manual of Crop Forecast of the Government of India, 
“this normal or average yield will not necessarily correspond 
with the average of a series of year*' figures which is 
indeed an aiithmetica! abstraction and may possibly never 
occur. 1 * According to (be Bureau ol Statistics, ITS. Depart* 
mem of Agriculture, 'a normal condition is not an average 
condition but a condition above the average giving promise of 
more than an Average crop. I he normal may be descrilxrd &* 
the condition of perfect health fulness, unimpaired by drought, 
hail, imeets or other injurious agency and with such growth 
and development as may reasonably be looked for under these 

favourable amt jit ions. " Considered in thr light of this 

definition Indian crops are generally sub-normal, 

Thr depart merits of Agriculture in different states are 
responsible ibt thr fixation of the 'normal vcildh l he estimate 
of the norma! crop is made cm the basis of crop cutting experi¬ 
ments.. [ hr method is to select some average plots according 
to the rules la tel down by the state authorities and to sow and 
harvest the crop cm these fir Ids before the officer* of the agricul¬ 
tural or i e venue department. The observations are passed on 
to the Director of Agriculture who takes into consideration 
vaih'Uf other facts and figures and finally fixes the uuiuui yield. 
The figures fixed are revised after every live years on the baft* 
of thr experience and the information tif the period. 

Besides the ob/ei lions against (hr very concept of 'normal 
> ield\ thr method of lifting thr normal yield* described above 
has !wen .subjec ted to a number of criticisms It is held that 
the Few scattered crop tutting* of small plots that are made 
cannot be expected to give reliable results, Then the purpo¬ 
sive select ton of plots of lands for experiments by the local 
officers leave* scope for personal prejudice and bia« which is 
undesirable from statistical point of view. The only alter¬ 
native to this is the system of Random Sampling which has it* 
own problems which are discussed later. It .bar further been 
experienced that the persons entrusted with the work- —-I.fee 



INDIAN HTATmiOS 


681 


patwarh the k*nungo and the district officer— art such &§ art 
already overburdened with other works so that they pay little 
serious attention to this work. The districts selected for crop 
cutting experiments continue to he those that were approved 
some fifty years bark. At least the boundaries of these districts 
need be revised and in doing so it would \k better if instead of 
taking into account the political boundaries, the economic 
homogeneity of the units is emphasised. Another criticism 
relates to the sixe of the plot of land selected. Usually the 
sire n i IOth of an acre though in the case of sugarcane it is 
! , 40th of an acre, It is suggested in this connection that 
whrthei the size be smaller or bigger than the present fixed 
size, it should be the average sire on which an ordinary culti¬ 
vator sow's a particular crop. For better utilisation of kite 
data collected for fixation of norma! yield* it is also suggested 
that instead of looking at the figures collected yearly only at 
the end o! hvr years, constant siudy of these should Ire made 
and improvements and revisions made as and when necessary. 

Cm</rh«n factor or iht wasvnal factor. It is the practice to des¬ 
cribe thr condition of the crop in a particular year in relation to 
the normal crop as a fraction of the not mal nop, the fraction 
being denoted as so many annas when die normal is taken at lb 
annas. That is why this factor is also called the armacondition 
factor. The determination of this factor lias no statistical basts. 
It is a mere eye-estimate by the village Fatware "The village 
Tat war i goes to the fields during the growth of the crop and at 
the time of the harvest and on the basis of his experience of the 
crops of a number of previous year he decides ai to where the 
crop of that particular year stands as compared to the normal. 
The Taiwan* of different villages forward their opinions to the 
TehiMar who reports some sort of an average opinion to the 
dilute officer, who in turn modifies the figure as he deems 
necessary any reports the same to the Director of Agriculture. 
The Director of Agriculture supplement* this information from 
his own source* and finally employs it in conjuction with the 
normal yield to dctei mine the actual amount of crop. 



682 AN INTRODUCTION TO STATISTICAL METHODS 


The defects of this system o( estimating condition factor 
are very obvious Even the mmi trained persons cannot be 
expected to do justice to their job in view of the natural errors 
of judgment, personal bias and guessing inaccuracies that are 
unavoidably involved in this system. The situation is worsened 
by the fact that some crop reporters arc mostly uneducated 
and have less awareness of the grave consequences of their 
wrong jtidgdment. They have grown to he pcssimits bv nature 
so that they have a tendency to exaggerate scarcity con¬ 
ditions and under-estimate better crops. This is so, firstly 
because of revenue considerations and secondly, became they 
think very highly of the lb-anna standard crop in relation to 
which they fix the condition fat tor, further it has been noted 
that these men have a peculiar psychological favour for even 
figures. 

A consideration of the three factors involved in the deter- 
mi nation of yield figures has thus revealed that the 
situation is far from satisfactory Further their are a mi in her 
of other factors that have reduced the utility of final conclu¬ 
sions No uniform method >s followed in the transformation 
of the data from thr primary stages to higher stages. Some¬ 
times the arithmetic average is used, though commonly mode 
is employed for their purpose. The crop report received art 
very few so that they bid to lead to representative conclusions. 
T hen the delay in the publication of these factors render them 
sometime obsolete and merely of historical value. 

Against our subjective forecasts, the I S.A. has made some 
experiments in having objective forecasts based on weather 
conditions, soil fertility, manuring, etc. More accurate con¬ 
clusions are no doubt obtainable but as the conditions exist in 
India the method cannot be employed here in the near future. 

Random wmpting method f^-r estimating yield. The only pos¬ 
sible wav of improving agricultural statistics is the extensive 
use of random sample surveys as the method of estimating 
yield. T he pioneering attempt at making an objective estimate 
of annual yield directly from a large number of sample cuttings 



INDIA* STATISTIC* 


68 ‘‘ 

without recourse of normal yield or the seasonal conditk>t* 
factor, wai made by Hubbat k in 1923*2!v Hubback recognised 
that the unly way to estimate > Irld satisfactorily was by approxT 
mating as far as possible to random sampling. For this pur- 
pc«v a large number «.»!''sample scattered over the tract are to 
be harvested. Hubback suggested that each sample should be 
iti a vnv small size, \ s/. f 13 b sep ft. The process of sampling 
was that at each enure workers should go out a fixed distance 
arid circling round, return from another direction, harvest mg 
out sample in each field where harvesting was in progress. 
Based on this plan Hubback conducted a number of surveys 
mi paddy } odds in Bihar and Orissa, 

Although there was emphasis on randomisation in his 
surveys, they lacked it at a number of points The centres 
were not selected iaiuioml> and sampling at the centirs was 
limned onlv to those Ik his where hat vesting was m progress 
on the dav of sampler's visit. Moreover, there was scope in the 
plan fur the exercise nt personal bias in the location of the 
sample. Further ihr extremely small size of the sample was 
found lu lead to an appreciable over-estimation of yield. 

A lurihrr advance m the direction of estimating crop yield 
by a random sample survey ha* been made by Sri M-ahalaitobi*. 
His design ot survey and lit** plan of held work ate very simi¬ 
lar to those of Hubbark U1 rm parties of samplers are 
required to go rapidly from place to place doting the harvest¬ 
ing season, cutting sample pints out of only those fields which 
arc ready for harvesting on the day of sampler's visit. Mere 
again the principle of randomisation is violated* because every 
field bearing the particular crop tiers not gel due chance of 
bring included in the sample because in some fields at (fie time 
of sampler's visit the crop will have been harvested or the 
crop will not be ready for harvest. But this difficulty is 
inherent in Mahalancrbis’ approach and so hi# plan is rendered 
unsuitable for field surveys. Mahab nobis* plan also makes 
use of a portable frame to cut sample plots. But the size of the 
sample plots suggested by him is larger. A plot tut of 50 to 
100 sq. ft; is considered the right one. 



684 AH INTRODUCTlOH TO statistical methods 

According to smother plan sample plots were to be distri* 
bated over the area sampled. In Bengal this was achieved by 
dividing the entire province into zones of 64 sq. miles each* 
The zones were grouped into a certain number of block. A 
certain number of zones were selected from each block. Each 
cone was divided into 64 cells of one sq. mile each. So from 
the selected zones certain number of cell* were selected and in 
each selected cell four grids of 2 25 acres each were picked out 
at random. In each selected grid two sample cuts were to be 
harvested in one or two fields growing the particular crop. 

Any of the above plans of crop cutting experiment* by 
methods of random sampling is rendered impracticable when 
the problems of administration and of the heavy expenditure 
involved are taken into account. More practicable would be 
to develop a plan which will fit into the existing administrative 
structure and will not necessitate any heavy additional expen¬ 
diture on field staff. 

Such a plan of random sampling for estimating crop yield 
ha* been suggested by Sukhatme. According to him a random 
•ample of certain number of plots, cad be drawn as follows : 

Every tract for which the average yield b to be estimated 
must be having administrative sub-divisions. From each sub* 
division certain number of villages uan be selected randomly 
out of the already available list of villages in the sub-division, 
Lists shall have to be prepared showing the number of 
fields that are growing a particular crop in each of the selected 
village*. These can be prepared easily with the help of the 
Patwarift. From out of these lists requisite number of fields 
can be selected randomly. On these selected fields the sample 
shall, be harvested by locating a plot of suitable dimension in a 
random position in the field. 

In the above plan, the sample plots having been selected 
in strict random fashion, the average y ield of the harvested 
plots would give an unbiased estimate of the average yield for 
the whole tract. The accuracy of ihe sample average can foe 
level by increasing the number of plots in the 



: 685 

sample. The extent of accuracy achieved can then be measured 
by determining it« standard cror. 

The method that is how adopted in most of the states is 
baaed on Sukhatme** plan. This method “consists in strati* 
tying a district by its administrative sub-divisions, selecting 
randomly within each sub-division a certain number of villages 
in proportion to the area under that crop* and locating within 
each village two random fields under the crop for making a 
crop-cutting plot in a random position in each/ 11 This plot is 
usually rectangular and is equivalent to 1 /80th of an acre. 
This plot is marked with the help of pegs, string, measuring- 
tape, etc. The method of harvesting adapted is the same as 
practised by people of that area. On the basis of yield of these 
sample cuts estimates of the yield per acre are made. On ar» 
average between 100-200 plots are harvested in each district. 
The field work in this connection is done by the staff of Land 
Records and Agricultural departments of the states under the 
guidance of the states* statisticians. The co-ordination of the 
plans and results of the surveys is done by the Centre. 

Livestock Statistics 

With regard to our livestock and poultry, statistics relating 
to only their numbers arc collected and that too at intervals 
of five years, during the quinquennial livestock census. The 
first All-India Livestock Census was held in the winter of 
1919-20 and the second during the winter of 1924-25. It was 
subsequently felt that the period of enumeration was too wide 
and that the value of the results was to some eatent vitiated 
by the imer-provincial movement of cattle. The duration 
of the census enumeration was, therefore, reduced to only one 
month. 

The last census was conducted in 1956 whh 15th April as 
the reference date. This census was conducted on an improved 
bads as compared to earlier census. Live** k census officers 

* See. 'Aanr«tii»f*l - Sitmtkm ■ in- India 9 , Kwtmb*r $ If 54* artWb . by 



686 ah immmomm to otatmtical method* 

were appointed fat each state to be in overall charge of the 
census. This was done with a view to expedite the compilation 
of the data and improve their reliability. The bask concepts 
and definitions were also laid down for adoption by all the 
states. The enumeration was done for the first time on a 
household bask With a view to verify the census count an 
arf 9m sample verification survey wa: organised by the NSS 
during June-July 1936. For the rural areas 1,924 villages and 
for urban area 340 blocks were selected. 

The next census was held in April 1961. For this purpose 
the lists of households prepared for the 1961 population census 
were available. A census calendar was drawn up for ensuring 
the completion of various operations connected with the census 
according to schedule. Provision was also made for rationalised 
supervision in 5% villages and urban areas to be selected at 
random. In the selected villages 10% of the Itouteholds were 
selected by systematic sampling with a random start for 
rationalised supervision. Provision was also made for adequate 
training to enumerators and supervisors. 

Livestock and poultry is classified for 1961 in the following 
manner : 

Livestock is classified into two groups, via., bovine and 
Other livestock. Bovine* are classified as cattle and buffaloes, 
and Further classified according to sex and age. Tims the 
following three categories are obtained : 

L Bulls and he-buffaloes over three years, 

2. Cows and she-buffaloes over three years, and 

3. Youngs lock. 

The first category is sub-divided into : 

(i) used lor breeding only , 

fii) used for breeding ami work both, 

Jill) used Tor work only, and 

(iv) not in me for breeding or work. 

The second category is sub-divided into : 

(a) in milk. 

(b) dry, 



mDIAW STATISTIC# <19? 

(c) not cahred even once* 

(d) used for work only, and 

(e) wot in use for work or breeding purpose*. 

The third category ts sub-divided according to age a»: 

(») below one year, and 

(ii) between one and three year*. 

Each one of these categories i* further divided according 
to sea 

Poultry is classified at: 

(i) Fowls, 

(»i) Ducks, and 

(iii) Other*. 

Fowl* are further classified a* hens, cocks and chicken. 

Other Livestock, This includes sheep, goats, horses and 
ponies, mules, donkeys, camels and pigs, Sheep and goats are 
classified according to age into two categories, vis.* 

(i) those up to one year, and 

(ii) those over one year* 

The latter category i* further sub-divided according to sex. 
Hones and ponies arc first classified according to age into three 
classes* vis.* 

(i! over three years* 

(ii) up to one year, and 
(lit) one to three years, 

and then each clast is subdivided according to sex. Donkeys 
and pigs are classified according to sex only and mules and 
camels only according to age. 

Forest Statistics 

The principal forest statistics are those relating to area under 
forest, volume of timber* outturn of timber and other forest 
produce* employment in forestry and forest industries, revenue 
and expenditure and foreign trade in forest produce* These 
are published in the annual ‘Indian Forest Statistics' issued by 
the Directorate of Economics and Statistics. 

dra*. Statistics of area are available according to outturn 
point of view, legal status* tomposiiion* etc. 



688 m innouvcnon to wTAtmiCAL uErnom 

(I) Fomt Am Jem Outturn point of mm. According to this 
forest area m classified into two group#, vk.,1. Merchantable, 
and 2* Inaccessible, 

Merchantable forests are defined ai those forests which are 
within the reach of economic management or exploitation as 
sources of forest products including immature forests and 
managed forests where felling are prohibited. 

Inaccessible forests are defined as those which are not yet 
managed or exploited owing to inaccessibility, or on account 
of the forest produce in them being unsaleable. 

(u) Forest Ana by Legal Status Informations with regard to 
reserved forests, protected forests and undassed forests are given 
separately. Reserved forest* are those that are intended mainly 
for timber production and in which rights of gracing and 
cultivation are seldom allowed. In protected forests the 
restrictions are not rigid. The unclassed forest refer to the 
inaccessible forests or unoccupied waste. 

(in) Fartst Ana by Camfumlitm* Under this the information 
is available in respect of conifers and broad leaved varieties, the 
latter being sub-divided into Sal, Teak, and miscellaneous. 

Outturn, The outturn of forest produce is given under the 
following heads ; 

(I) Timber, (2) Round wood, (3) Pulp and Matchwood, 
(4) Firewood, (5) Char coal wood, and (6) Outturn of Minor 
Forest Produce such as : 

(a) Fibre* and Flosses, 

(b) Bamboos and Canes, 

(c) Gums and Resins, and 

(d) Other sort of Minor Form Produce. 

Employment oj Labour in Forestry and Fomt Industries Informa¬ 
tion regarding number of persons employed on the first of each 
month in forestry and forest industries together with details 
person* dependent on them are given under this head. 

Xernma mi Expenditure. Figures of revenue and expenditure 
of the forest department are given in detail. 

Foreign Trade* Figure* of quant ity and vahse of imports and 



: r--k ■■ --3 ;<^.i5iip^ 

export! of wood and timber and value of import! aad export! 
of minor foreet produce and wood product! are given for the 
year baled on the Monthly Statistic! of Foreign Trade of 
India. . 

Mttfltrlil 

The state of industrial development of a country is the chief 
index of its economic progress. As such there it a growing 
need of reliable and up-to-date statistics on various aspects of 
industrialisation. Industrial statistics comprise of the statistics 
on Industrial capita], Industrial labour, Volume and cost of 
industrial output, methods of production, use of power, etc. 
All these statistics form an important source of information to 
administrators and businessmen. 

In advanced countries industrial statistics are ctaaifted into; 

(1) Capital: 

(a) Authorised Issued and Paid-up Capital, 

(b) Fixed Capital, and 

(c) Working Capital. 

(2) Labour: 

(a) Number of persons employed, classified as adults or 
children, male or female, and 

(b) Salaries and wages paid to various classes of workers. 

( 5 ) Cost of Production ; 

(a) Quantities of raw material consumed, 

(b) Value of raw material consumed, and 

(c) Other items. 

(4) Output: 

(a) Quantity and value of the main product, and 

(b) Quantity and value of by-product. 

( 5 ) Pmtr: 

(a) Details of power used—its cost and consumption. 

Maun ami mint* of Industrial statist*** mwtabU m butt* 
partmsdagUt largo scab imkstrios. This can be studied under the 
following three major groups: 

.-ft)';' 

44 ; 4 . 



W6 aw twnotmrmm to nxnmcAh wxrmam 

(H) Statistics of Output and Cost, and 
(Hl> ffcatistics of Power consumed 

(I) Gtimal Statistics, These give information about the 
number of factories, number of personnel, the amount of 
capital invested, etc. 

The sources of informat ion on these are : 

(a) Large Industrial Establishments in India published by 
the Labour Bureau, 

(b) Statistical Abstract (Annual) published by the Central 
Statistical Organisation, 

(c) Statistics of Factories in India (Annual) published by 
the Labour Bureau, and 

(d) Report on the working Joint Stock Companies in India 
(Annual) issued by the Registrar of Joint Stock Com¬ 
panies. 

The information contained in the above publications is as 
follows ; 

L Number cf Fsetorits. It is given in publication (a) above. 
The statistics relate to such establishments only which employ 
not less than 20 persons. 

The factories arc classified into ; 

(i) Tea tiles, 

(it) Engineering, 

(Hi) Minerals and Metals, 

(iv) Food, Drink and Tobacco, 

(v) Chemicals, Dyes, 

(vi) Paper and Printing, 

(vii) Processes relating to wood, stone and glass, 

(viii) Processes connected with skins and hides, 

(ix) Gins and Presses, and 

fx) Miscellaneous, 

Bach of the above group is further subdivided into smaller 
categories. Information w given about seasonal and perennial 
factories. Seasonal factory is one which does not work for 
morn than ISO days in year* 

Information relating to number of factories It-fbawadl.' 



wma* statistics : ■ #] 

Statistical Abstractalio* But ike figures of two publication* 

(a) and (b) do not tally with each other because publication 

(b) includes also factories employing less than 20 persons while 
such factories are excluded in publication (a), 

2. Average doily number of persons employed. This information 
is available in publication (a). It it obtained by the total 
number of attendance on all working days divided by the total 
number of working days. 

Statistical Abstract gives information relating to the number 
of persons employed classified into adults or children, male or 
female, Stfttewise information is also available in the Statistical 
Abstract. Publication (c) and (d) also give this information, 

3. (i) Capital inmted in the industries. The Statistical 
Abstract and the Report on the Joint Stock Companies contain 
the figures relating to the amount of invested capital. 

Separate information it not available with regard to fixed 
capital or working capital or the extent of foreign capital in 
our industries. The information is, however, classified into 
authorised capital, paid-up capital and debentures. But the 
information available is not complete and is of doubtful 
accuracy. 

(it) Statistics of Output and Cast Statistics relating to the 
output and cost have greatlty improved during the last ten 
years. The situation in respect of cotton mills industry is 
most satisfactory, as these mills are under a statutory obli¬ 
gation to publish necessary information due to the passing of 
the Cotton Industry (Statistics) Act in 3926, All informaiion 
regarding cotton mills are published in the Monthly Statistics etf 
Cotton Spmmng and Weaning in Indian Mills issued by the Direc¬ 
torate of Industrial statistics. The output statistics are 
expressed either in lb. or in yards, as the case may be. 

Th Statistical Abstract (Annual) also publishes a summary 
of these statistics. Monthly Statistics rf Production of Sdaekd 
Indmtria of India also gives information relating to output 
It contains information about output and cents 
It contains figures relating to only it** 



iff aii tmowcnM to RAnvncix mstbom 

indintrid which (apply the inibnattioo voiuDarily. Thutfer t, 
it is not satisfactory. 

Tht Jmtm Trado Journal published weekly give* information 
pertaining to production of sugar including •tocls of sugar. 
Figures of production are compiled by the Imperial Institute 
of Sugar Technology. Tue Journal also publishes the memo- 
rand am on sugar production bsued by the Director of Imperial 
Inititute of Sugar Technology. 

(HI) SutuHts $/ Pmn Cantmti. It is published in 
MoatUj Sun# of Btuitms Ctn&tiant in India giving information 
relating to electric power produced and consumed in India. 
From November 1949 onward* the figure* are available about 
the total power generated, and the total power consumed. 
The information supplied is incomplete because rite power 
produced in F.W.D. and Military statkmr is excluded al¬ 
together. 


The census of manufacturing industries was conducted 
annually by the Directorate of Industral Statistics under the 
authority of the Industrial Statistics Act, 1942. This Act was 
repealed by the Collection of Statistics Act, 1953 which came 
Into force on 10th November, 1956. The census for 1958 
and onward* would be under the authority of this Act. This 
census is an attempt to collect and interpret comprehensively 
statistics relating to industries in a systematic manner, the 
statistic* of the census for the year 1946 to 1955 have been 
published, and those of 1956 are ready and are .likely to be 
released shortly. 

The 1955 census (which is the latest available at the time of 
writing) covered the following states— Bombay, West Bengal, 
Madras, Uttar Pradesh, Bihar, Madhya Pradesh, Punjab, 
Delhi, Orissa, Rajasthan, Patiala and East Punjab States 
Union, Saurashtra, Assam, Ajmer, Himachal Pradesh, Vtndhya 
Pradesh, Kutch and Travaacore-Cochin. 

For the purposes of the Census of Manufacture*. industry 

ftkmm kaum tmihelasm JE® lummetw r- nnam Aka* 

IMMI OWO C H WIIKH %IHQ6r W H— W* 4. DC CdtfUS BaS BO SwT P CC<I 



confined to 29 of these industries, ol which the producer gas 
industry no longer exists, end collects information only from 
those factories which come under the purview of Section 2 (j) 
of Indian Factories Act. 

The collection of returns from individual factories for the 
purposes of the census is the rcsponsbOhy of the Statistics 
Authorities appointed under Section 4 of the Industrial Statistics 
Act, by State Governments. The Authorities collect the returns 
from factories, verify them, and forward cor r ected copies to the 
Directorate of Industrial Statistics, where the results of the 
census are finally compiled and published. 

The census is conducted with reference to lists or factories, 
claatified according to industry, furnished to the proper Sta¬ 
tistics Authorities by the Chief Inspector of Factories. The 
criteria of classification is the value of its principal prod uc t s . 

The factories submit their returns in forms pres cri bed for 
the purpose by the census of manufacturing industries rules. 
Every form consists of six parts to which the first two parts are 
identical for ail industries. The information which each of 
these parts supplies is as under : 

Part A : General Information. 

Part B : Capital Structure. 

Fart C: Persons employed; salaries and wages paid, and 
other contributions made to emyloyees during the 
year. 

Fart D: Fuel, electricity, coal, etc., purchased and con* 
turned during the year. 

Fart E: Materials purchased and consumed during the year 
in the manufacture of products. 

Fart F: Produce duriag the year, 

A. Gtrtenl hformetim. Under this bead the information 
is given with regard to the urn, address, business, etc., of the 

factory. 

*■ QfW Strmtv*. Regarding this factories are to supply 
information regarding their productive capital classified at 
Fixed and Working, mentioning important compon ent* under 



#94 AH IHTBOIHJCTIO* TO STAtOTJCAL MKTHODft 

eachoftheseheads. lathe reportsof the census of Indian 
Manufactures statewtse and industrywide breakdown* of the 
productive capital are provided both in the form of diagrams 
at well at numerical data. 

C. Person Employed, The employees are divided into two 
categories : (i) wage*earners, and (ii) others- Information 
with regard to number? classified as adults or children and also 
according to sex is provided in the report, The mformatiofi 
is given for each industry as well as for each state. The total 
persons employed include all administrative, technical and 
clerical staff, working whithin the factory area excluding persons 
employed in sales organisation. 

Daily attendance is supposed to mean the total of the 
attendance of each shift for a day. The average number of 
persons employed per day is obtained by the total number of 
attendance on all working days divided by the total number 
of working days. Wages and salaries include cost of living 
and dearness allowances, overtime earnings and other addi¬ 
tional payments or benefits. 

Money values of any privileges or benefits or contributions 
not paid in cash, but capable of being estimated in terms of 
money and which accrue to individual employees, are included 
in the wages. 

D, Fuel, Electricity, Cool 9 *U. Information regarding 
quantities and values of fuel, vie., coal, coke etc, quantity 
of electricity consumed and quantity of water used is supplied 
under this heading. Electricity generated within the limits 
of the factory is excluded while electricity sold outside is to be 
included. 

E* Csi< */ Row MaUrutls Consumed and Purchased. The 
cost of raw materials include data on the quantity and value 
of each material other than fuel* electricity, coal-gas, lubrica¬ 
ting gas and water purchased and consumed during the year 
tn the manufacture of products and by-products made for 
sale* Only materials bought from outside are included, 

F- Produce during the Tm. Total value and quantity of 
products and by-products made in the factory ate to W 



maumaTAttma ■; 

furnished. Information is also provided with regard to value 
added by each industry. 

hlix Hkffilwf «f fadeilrlal Pretetioi 

The function of such am index is to cover the gup between 
infercemal periods. Statistics of production from important 
industries which may be representative should be collected 
yearly and converted into a single index of production* Weights 
should be assigned on the basts of relative importance of 
industries as measured by net output, be., value added by 
manufacture. 

Capital, a weekly journal of Calcutta, has been publishing 
every month an Index of Indian Industrial Activity since 
March 1930. Base year is 1935 and series selected and weights 
employed are as follows ; 


Industrial Production 

Weight 

1 . Cotton Manufactures 

... " ... .9 

2. Jute Manufactures 

. 6 

3. Steel Ingots 

. 5 

4* Pig Iron 

. 8 

5. Cement 

5 

6, Paper 

s 

Mineral Production 


h Coal 

7 

2. Rail and River-borne Trade 

. 24 

Financial Statistics 


Cheque clearances ... 

20 

Trade-Foreign and Coastal 


L Reports 

4 

2. Imports 

3 

Shipping—Foreign and Coastal 


1. Tonnage cleared 

... 3 

2. Tonnage entered 

... .. . ... 3 

Trod*—Foreign end CM; 

cod Skipping— Fcnign mid: 


mute/, have been exclu d ed since March Iff 1* Instead Motes 
w ith w eig ht 0 and,' consumption of Bl c cig fc i ty 









696 aw ifrrmoDucnow to vtummekt methods 

with weight 7 have been included* The weighted geometric 
mean i* used. . 

Data for the compilation of the index number are taken 
from the publications of the Department of Commercial Inteili- 
fence and statistics and the statistical summary of the Reserve 
Bank of India. Them are several shortcomings in this index. 
It does not give a fair idea of the industrial production of the 
country. The activities of people living in rural areas arc 
completely ignored. And 90 for as urban population is 
concerned the index is not fully representative because it totally 
ignores the industries like sugar, bides and skins and tea which 
are so important in the industrial economy of our country. 

index of Industrial Production. A scries of Index Numbers of 
Industrial Production is available since 1947. This series is 
called ‘Interim Index of Industrial Production* and was com¬ 
piled by the Directorate of Industrial Production, Ministry of 
Comitierte and Industry. It was discontinued from April 1956. 
This index included 20 industries and was issued every month 
for each industry and all the twenty industries combined. 

The Base year of the series was the year 1946. 

The data used for the series were obtained from Monthly 
Statistics of Production of Selected Industries complied by the 
Directorate of Industrial Statistics. 

It was a weighted index and the weights allotted to the 
different industries are in proportion to the values added by 
manufacture by them. These values were obtained from the 
Report of the First Census of Manufactures, 1946. In order 
that these values could be used as weights the industrial classi¬ 
fication followed in the Census of Manufactures has been adop¬ 
ted for purposes of this index also. The formula used is the 
weighted mean of quantity relatives, i*e. # 

~ (-*-**•*) 

~ 2-^-— " 

The 'able Iwbw j$ivw the names of industries which were 



; 697 

included in this *eries of index numbers and weights that were 
assigned to them for this purpose: 


Industry or Group Weight assigned 


I. Cod ... 

... 

11-95 

2. Sugar ... 


5-54 

3. Paints and Varnishes 

• 6 ¥ 90 

•61 

4* Cement 

• ** 996 

*67 

5, Glass 

* »» a# w 

•55 

6. Refractories 


*48 

7. Plywood 


•15 

8, .Paper and Paper board 

¥ * d « » 9 

1*46 

9, Matches 

Ml H>« 

1*21 

10. Cotton Textiles 

999 

43 49 

II. Woollen Manufactures 


1*38 

12. Jute 


16*53 

13. Chemical ... 


3*10 

14. Non-ferrous metals 


2*14 

15. S teel 


7*16 

16. Bicycles 

»(« 

*11 

17. Sewing Machines 


*02 

18. Electric Lamps ... 


*04 

19- Electric Fans 


*35 

20. General Engineering and Electric*! 


Engineering 

• 

5*06 

Totd ... 

... 

100*00 


Jtaistd hitx of hdutrul PnJm^em. The Interim Index of 
Industrial Production with 1946 as its base, was discontinued 
from April 1956. In its place, a Revised Index of Industrial 













698 AH INTBOOIJCTIOH TO STATISTICAL METHODS 

Production with 195] as bate is being published lince October 
1955. 

Comaf *, The Revised Index is based on the data of the 
production of selected industries of (! ) Mining and Quarrying, 
(2) Manufacturing, and (3) Electricity, gas and steam. 

These industries have been divided into 18 groups (U*N. 
Major Groups), and cover 88 items as against the 36 items 
covered by the interim index. These have been classified in 
accordance with the Industrial Standard Indu» trialClassification 
Of all Economic Activities. 

Bess. The working party on base periods of official Index 
numbers, appointed by the standing committee of departmental 
statisticians recommended the calendar year 1951 for the base 
of the Index of Industrial Production and consequently the 
same has been adopted as the base for these Indices. 

Weights ; Weights have been assigned to the various items 
in proportion to the value added by manufacture. The value 
added by manufacture has been obtained from different sources 
for different Items. In most of the cases they have been 
obtained from Census of Indian Manufactures 1951 and Sample 
Survey of Manufacturing industries. The weights determined 
on the basis of values added by manufacture are distributed to 
the subalterns in proportion to their gross values. 

Index. The Index is a simple weighted Arithmetic mean 
calculated by the formula : 

, B**Wm 

‘ —i K~ 

where A*, is the production relative for the item for the 
month and W m the weight allotted to it. 

dsffustmentfor Ike tmrutim in the number ef days in the month. 
The Index has been adjusted for variations in the month by 
tW formula: 

■ | x a verage numbe r of days o f a m onth in the yea r . ■ 

tSend^ nurobeFof daysTn the^month . 
where /?* and It m are adjusted and unadjusted production 
relatives of the item. 



Such adjustment have, however, not been made for Tea* 
Sugar and Salt. 

Adjustment for Seasonal Vat tat urns. The items Sugar, Sail 
and Tea exhibit marked seasonal variations. Adjustments for 
seasonal fluctuations of these items have been made by using 
the following seasonal indices computed by the method of 
moving averages, based on the data of 14 years in the ease of 
Sugar and of 6 years in the case of Salt and Tea : 


Month 

Sugar 

Salt 

Te» 

January 

286-81 

33 22 

11-75 





March 

23834 

12975 

17-90 

April 

91*23 

215-28 

67-25 

May * 

23*79 

300-67 

93 66 

June 

3 88 

230-70 

13715 

July 

1*33 

74 59 

135-77 

August 

1*93 

32-66 

180-21 

September 

2*t>8 

3275 

176-79 

October 

5*62 

23-22 

181-37 

November 

28*43 

21-56 

119-20 

December 

188*33 

35-41 

39-77 


The New Revised Index of Industrial Production 

It has been felt for some time that the latter, i.e., the cur¬ 
rent Index could adequately represent the recent industrial 
growth because of (a) remoteness of base year, (b) the weighting 
pattern based on data for that year, and (c) rapid development 
of several new items of production in recent years. Accordingly 
a Working Group was set up to study and recommend the lines 
of its revision . Based on the recommendations of the Working 
Group the index of industrial production has again been 
revised with 1956 as base replacing the index with 1951 as base 
asfromJul>I962 

Cwagv. The new index of industrial production is 
based oto 201 items of prod uctk>n as compared to 88 items in 





700 AW IWTBODUCTlOfl to OTATISTICAX. METHODS ' . 

die writer index. The increase boot all doe to the inclusion 
of new items; some hemi included in the previous hade* have 
been subdivided into sub-items. The items of the new index 
are classified according to the ‘National Ghutificatioo of All 
B oo oa mk Activities’, finalised recently by the C.8.O. 

Matt. The calendar year 1956 has been adopted as the 
base for the new index. 

Wrights. Weights have been allotted to different items 
in proportion to the value added by manufacture in 1956. The 
value added by manufacture has been obtained as stated 
below: 


(1) (1) 23S*Spinning, weaving 
finishing of jute tnanufai 


and 
manufactures. 


(if) (I 


(2) 341-Iron and Steel basic in¬ 
dustries .... 
2061-Wheat flour 
206)-Biscuits 
2091-Sugar . . . 

r4) 2102-Confectionery . 

(5) 21 lO-Manufiscture of hydroge¬ 
nated oils (Vanaspati). 

(6) 255-Spinning, weaving and 
finishing of woollen textiles. 

(7) 25-Manufacture of wood and 
cork except manufacture of 
furniture. 

(8) 27-Manufacture of paper and 
paper boards. 

(9) 3113-Synthetic resins and 


(10) 313-Cotton seed oils 
(ft) 314-Manufacture of paints, 
varnishes and lacquers. 

(12) 3132-Fine chemicals . . 

£18 3171-Soap . , . . 

(14) 319-Glue, fflycerinc, alum and 
liquid gold only. 

(15) 331-Manufacture of structural 


The value added 
figures for the 
groups have been 
obtained form the 
Census of Manu¬ 
factures 1956. 


A suitable propor¬ 
tion of value added 
by manufacture to 
gross value of out* 
put has been deter¬ 
mined for each of 
these items from 
the Census of 
Manufactures^ 
1956. The value 




(16) 332-Manu£»ctureofgU* pro- added by masni* 

duets. fecture for these 

(17) 333-Manufcatun: of pottery, major groups and 

GUni and earthenware. items have then 

(16) 3340-Cement . . been estimated by 

(19) 3395-Grinding wheels. . multiplying tHeii 

(20) 3491-Antimony and lead gross value of out* 

(virgin metal) only. put in 1056 as 

(21) 3402*AUoys > not elsewhere supplied by die 

classified . . Ministry of Ctem* 

(22) 3493-Batk forms, not else- xnerce ft Industry 

where classified. by this proportion. 

(23) 5530-Manufactitre of bolts, 
nuts, screws, nails, etc. 

(24) 3592-Razor blades 

(25) 3599-Mctai products, not else¬ 
where classified. 

(26) 368-Manufacture of com¬ 
mercial office and household 
machines. 

(27) 375-Manufacture of electrical 
appliances. 

(28) 386-Manufacture of bicycles, 
tricycles and perambulators. 

(29) 389-Manufacture of transport 
equipment not elsewhere 
classified— 

(i) Three wheelers , . . 4 

(in) (1) 2270-Cigarettes . . . \ 

(2) 2341-Art silk yarn . In the case of these 

(3) 30-Manufacture of rubber pro- major groups, 

ducts.groups and items, 

(4) 3111-Hcavy organic chemicals. the proportion of 

(3) 3112-Heavy inorganic chemicals value added to 

(6) 3117-Dyestuff ... gross value of out* 

(7) 3122-Manufacture of inorganic 1 put has been ascer* 

fertilisers. tained either from 

(8) 3151-Drugs and pharma- the Report on 

ceutkats Sample Survey of 

(9) 32-Manufacture of petroleum Manufacturing In¬ 
products dustries, 1053 or 

(10) 3391-Asbestos (cement sheets) the Report oil 

(11) 3540-Manufechireof hand tools Sample Survey of 

and small tools. Man ufactunV^ 1**- 

(12) 3594-Hurricane lanterns dustries, 1056, 

(13) 3632-Inlemat combustion en* I Survey ofSchedul- 

gim. y cd Industries. Tin? 



702 AW INTBODTOriON to WATMTICAJL MSTBODS 

(14) 364I-Te*tlk machinery. . gran value of out* 

(15) 3659-Induitrial machinery put in 1954 a• 

we. . . . available from 

(16) 966741) Ball bearing . Ministry of Com* 

(17) S7H*rower transformers . mcrce and Indus* 

(18) 3721~Elcctrxc motors . . try of each of these 

(19) S?4»Manufacture of batteries items has been 

f >) 3766-Radio receivers . multiplied by the 

(21) S8S*Manulacture of motor appropriate pro* 

vehicles (automobiles). portion calculated 

(22) 385-Manufacture of motor from the SSMI 

cycles and scooters. Reports. The pro* 

(23) 389-Manufacture of transport duct has been 

equipment not elsewhere taken as an esti* 

classified— mate of the value 

(It) Trailers . , . . added by manu- 

j facture. 

Indian Bureau of 
Mines furnished 
the figure of pro¬ 
portion of value 
added to gross 
value of output. 
Their estimated 

(iv) (1) 1210-Iron ore , gross value of out- 

(2) 3491-Metals not elsewhere y put in 1956 has 
classified (gold only). teen multiplied by 

the corrcspon* 
ding proportion of 
value added to 
gross value. The 
product has been 
taken as the value 
added by vnanu- 
J facture. 

From statistics col¬ 
lected directly from 
certain concerns! 
the proportion of 
value added to 
. jpross vmiuc has’ 
been estimated. 

M (I) II MhCool . , . He pa* value «f 

(2) 5822-Railway w«f?on* . . J output m 1956 of 

; [ th«w two items h» 



HiMAK STATISTICS 



been multiplied by 
co rresponding pen- 
portion estimated. 

the value added 
by manufacture* 


(vt) (I) 2130-Tea processing 


(vrii) (2) 2140-Coffee curing 




1 With the help of 
return* of Annual 
Survey of Indus- 
trie* 1959, particu¬ 
lar* available m 
1 Report* on Sample 
I Survey of Manu- 
j factoring Indus- 
1 trie* and Statis- 

I tics furnished by 
Indian Tea Asso¬ 
ciation, Calcutta, 
^ the proportion of 
value added to 
stow value of out- 
I put ha* been ascer* 

J tained. The esti¬ 
mate of gross value 
of output in 1956 
of Tea industry' 
ha* been multiplied 
by this proportion 
to arrive at the 
value added by 
J manufacture. 

The value added by 
manufacture ha* 
been estimated 
from information 
supplied by the 
CoffeeBoard. 

The Salt Commis¬ 
sioner furnished 
certain particulars 
from which the 
proportion of value 
added to gross 
value has been 



70* ah nvTSOoccnoH to turaneu. hhokhmi 

dete rmined- TV 
grow value of out* 

C t of salt in 1956 
i been nmlti- 
plied by this pro* 
portion to amve 
at m estimate of 
the value added by 
manufacture. 


(la) (4) 2322-Weaving of cotton 
tortile*— 

(li) Decentralised sector • 
2342-Rayon fabrics 


The estimate of the 
value added by 
.*Wiire has 
► been supplied by 
the Textile Com* 
J misskmer. 


(a) (5) 2321-Cotton spinning 

2322-Weaving of cotton tex¬ 
tiles— 

(i) Mill sector . 


1 The value added by 
manufacture by 
the industry* as a 
whole was avail¬ 
able from the 
Census of Indian 
Manufactur et, 
1956. The Census 
estimate relates to 
the Mill sector 
only- With the 
help of Census 
data, the value 
added has been 
split up into (a) 
value added by 
production of yarn 
at spindle point 
and (b) value 
added by weaving 
of doth from yarn 
(on the lines des¬ 
cribed below), and 
a proportion he- 


been 1 ascertained. 
Thereafter* a ratio 
of the quantity of 
doth as ptddWied 
in the Monthly 




wumtm 

Statistics to the quantity of 
woven pfccegoods as reported 
in the Census hat been com* 
jMftea far the year 1956, and 
the Census estimate of value 
► added by the Cotton Tex¬ 
tile! has been inflated by 
this ratio. The resulting 
value added has been distri¬ 
buted to yarn and doth in 
the same proportion a* their 
estimated value added from 
I the Ceuta data 


j The value added by produe- 
I tion of yarn at spindle 
point and that by weav¬ 
ing cloth has been estimated 
as follows from Census data 
of 1996. The yarn consumed 
for weaving has been esti¬ 
mated by subtracting'Yarn 
made for sale’ from total 
quantity of yarn at spindle 
point. It hat been multi- 
plied by the rate of ex-factory 
■ selling value of yam made for 
sale and the product hat Deer 
deducted from the grots value 
of woven piecegoods. The 
difference has been taken as a 
rough measure of value added 
by weaving from yarn. This 
estimate of value added has 
been deducted from the Cen¬ 
sus estimate of value added few 
the industry in 1996. The dif¬ 
ference has been taken as the 
value addedbyspmningyarn 
from cotton. The fMuportion 
of value added by spinning to 
value added by weaviaff has 
been computed. The estima¬ 
ted value added by the indus¬ 
try has been split up in this 
proportion as stated swifer. 


706 a* wmoDucnoH to btatibticai. metbods 


(*i) ( 6 ) 24 (.Footwear leather . ") The value added per 

thousand units of 
291•Tanneries and leather finish- quantity has been 
mg plants computed from in¬ 

formation furnish- 

(Ui) 2991-Pickers . , ed by an important 

manufacturing 

(vt) 9999-Leather doth concern. The total 

physical output in 
1956 has been mnl- 
| tiplied by this rate 
of value added. 
The product has 
been taken as the 
total value added 
in 19%. 

(ail) (7) $180*Manufacture of matches. Ministry of Com¬ 
merce and Indus¬ 
try have furnished 
the value added 
by manufacture. 

Jaiii) (6) 342-Copper smelting and The estimate of value 
rolling mills.; added by manu¬ 

facture by ‘Alumi- 

343- Aluminium manufacture nium, Copper and 

Brass I ndustry — 

344- Brass manufacture . CM! (22) f in 1956 

V published in the 
Eleventh Census of 
Indian Manufac¬ 
tures has been dis¬ 
tributed to l hear 
three items in pro¬ 
portion to Grots 
value of their out- 
J put. 



ttt&UJf STATISTICS 


707 


(xiv) (9) 373-Manufacturt of electric *) Economic Adviser's 
cables and wires. office furnished the 

pro poll ion of value 
3790-Manufacture of electri- added to Gross 
cat machines, appliances and value of Electrical 
supplies, n. e. c. Engineering. The 

y gross value of out¬ 
put in 1956 has 
been multiplied by 
this proportion to 
estimate the value 
added by rnami- 
J facture. 

(iv) (!0) 5100-Electricity . The value added has 

been estimated 
from the Revenue 
Account of the 
Public Electricity 
Supply, General 
Review, Dccemlier 
1959 published by 
the Central Water 
and Power Com¬ 
mission, Simla. 

5. The total value added in 1956 by ail the 201 items 
included in the new revised index stands at Rs. 736 crores. 
The total value added by 88 items of the earlier index was 
Rs* 448 crores in 1951 and 573 crores in 1956. The value 
added in 1956 by the items of the new revised index and by 
the items of the current index worked out to be 90 per cent 
and 70 per cent respectively of the net output of factory esta* 
bUshments and mining sector in India during 1956*57.* The 
value added in 1951 by the items of the current index worked 
out to be 92 per cent of the net output of factory establishment 
and mining sector in India dm trig 1951-52. 8 

6. The number of items and their weights by Divisions/ 
Major Groups of Industry for the current and revised indices 
are as follows : 

1 Reference pages 96 and 107 of the blue book ‘National Income Statistics’ 
issttcd l»y the Central Statistical Organisation. 

* Stmt : National Income Division of the Central Statistical Organisation, 



Current Index 


Aft IH1 


rCTIOH TO STATISTICAL METHODS 



nuaii >o *o(s[ 


s 

g 5 s 


s'sj 


ffl 

l 

** 



IUI9I! JO -Ofsl 


S. 

i 

o 

4 

3E 


5 


P 











Current Index 



The figure for 1956 works out to be less then the figure in 195! due to a revised bum of estimation; the 
value added of figures of certain items were found to be somewhat over-estimated in the earlier Index. 









710 AM INTRODUCTION TO STATISTICAL METHODS 

7. Indtx. The index is a sample weighted arithmetic 

£ /? IV 

mean calculated by the formula : I- where / ii the 

index, B t is the production relative for the item i for the 
month in question and i\\ the weight alloted to it. 

8. The production series for certain items were not avail¬ 
able for all the years from 1951 to 1958. In such cases group 
indices were based on weighted average of relatives of items 
for which production figures were available excluding die 
weights of items for which figures were not available. Butin 
computing the major group indices from group indices or 
general index from major group indices no such adjustments in 
weights were made. 

9. Adjustment for Seasonal Variation. The index was calcu¬ 
lated for each month from January 1951 to December 1961 as 
explained in paras 8 and 9 above. The series of indices thus 
obtained was used in computing seasonal indices by the 
method of moving averages. The computed seasonal indices 
arc shown below : 




Feb. 

i 

March 

i 

April 

May 

June 

Seasonal 

index 

101 38 

97 48 

101 08 

98 70 

9888 

97-38 




August 



Nov. 

Dec. 

Seasonal 

index 

j 102-13 

10142 



97-90 

10402 


The general index of every month is adjusted for seasonal 
variation by using the above indices. 

10. At the end of each year seasonal indices will be com¬ 
puted covering that year also. The seasonal indices thus 
revised will be employed for seasonal adjustment of the monthly 
indices published. 






















INDIAN STATISTICS 711 

Trad# Statistics 

There are two kinds of trade, viz,* External and Internal. 

External trade is divided into : (!) Air-borne trade with 
foreign countries* (it) Trade along land frontiers* and 
(iii) Sea-borne trade, 

Internal trade is divided into : (i) Coastal trade* and 
(ii) Road, Rail, River-borne trade. 

Foreign Trade Statistic* 

Statistics of the foreign trade of a country may be broadly 
classified into two categories: (!) General Trade Statistics, and 
(ii) Special Trade Statistic*, 

General Trade Statistics refer to figures of all imports and 
exports of cou ntry. 

Special Trade Statistics refer to figures of imports for 
domestic consumption, exports of domestic goods, and figures 
of re-exports. 

Foreign trade statistics are collected for the following 
purposes : 

1. To estimate the balance of payment position and to 
determine the size and change in the balance of foreign 
exchange holdings ; 

2. To prepare the administer, the barter and other trade 
agreements between countries; 

3. To identify markets and to plan campaign for the 
expansion of sales ; and 

4. To estimate the national income of a country. 

Foreign trade statistics are very essential for judging the 

direction of trade, in arriving trade agreements, it is neces¬ 
sary to know the commodities exported to and imported from 
particular countties. Thus the figures of exports and imports 
must be classified commodity wise and country wise. To attain 
uniform commodity classification, there is a list orepared by 
th!e U.N.O. This list is intended as framework which can be 
expanded or contracted by each country for its own use. But, 



?12 All INTRODUCTION TO STATISTICAL METHODS 

for international comparability, it if necessary to preserve this 
framework. 

As for classification by country it is necessary to decide the 
stage at which the country is to be taken into account. For 
exports the following stages are possible : 

(t) Country of final destination, 

(i?) Country of sale, 

(iti) Country of immediate destination. 

For imports the stages are : 

(i) Country of purchase 

(ii) Country of last shipment. 

Sources of Information. Statistics of imports and exports are 
a by-product of the customs administration of a country. The 
reliability of these statistics depends upon the efficiency of the 
customs department. Therefore, a detailed knowledge of 
custom procedures is essential to the person working on foreign 
trade statistics. In order to compile both the general and 
special trade statistics customs documents are needed at various 
stages. For statistical purposes these should be separately 
identified and the following informations should be collected : 

1. The date of the transaction, 

2. Description of the commodity, 

3* Quantity of the commodity giving the units or measures 
used, 

4. Country of origin for imports and country of destination 
for exports, and 

5. Assessed value and duty paid. 

Mil’s Foreign Trade Statistics 

Sources of Information. The Statistics of India *i foreign trade 
are compiled and published by the Department of Commercial 
Intelligence and Statistics, Below is given a brief description 
of the two important publications that contain our foreign 
trade statistics: 

0? Monthly Statistics of the Foreign Trade of India* The 
name of the publication containing the monthly foreign 



IKDJAN STATISTICS 71$ 

trade statistics of India prior to January 1957 wti ‘Accounts 
relating to the Foreign Trade and Navigation of India*. This 
has now bean changed to Monthly Statistics of the Foreign 
Trade of Indio. Besides the change in the name of the publi¬ 
cation another important change introduced in January 1957 
was change in trade classification. The trade classification in 
use prior to January* 1957 provided for the separate specification 
of only 1,717 items in the foreign trade statistics. This classi¬ 
fication has now been replaced by the ‘ Indian Trade Classifica¬ 
tion' providing specification of more than 4,850 articles and ii 
based on the International Trades Classification recommended 
by the Economic and Social Council of the United Nations. 
This new classification has been adopted for all the three forms 
of trade, viz , sea, land and air, and as such it has been possible 
to combine the figures and to present statistics of the entire 
foreign trade in one series of tables. Before 1957 figures of land 
trade were shown separately and could not be combined 
because the classification of land trade differed from that for 
the sea and air-borne trade. 

With a view to ensuring convenience in handling this publi¬ 
cation now appears in two volumes, viz., 

Vol. I : Exports and Re-exports. 

Vol. II : Imports. 

Scope t>J the Publication. The figures of trade given in this 
publication arc of the foreign trade registered by Customs 
authorities at Indian Sea ports, Air ports and Land Customs 
Stations. The land-borne trade with Tibet, Nepal, Bhutan 
and Sikkim, and the trade arising in the Andaman and Nicobar 
Islands and the Laccadive, Minicoy and Amindivi Islands are 
excluded, 

System of Recording* The ‘general system* of statistical 
recording is followed. Thus, ‘Imports’ comprise goods brought 
across the customs frontier whether they are intended lot borne 
consumption, bonding or re-expotration, ‘Exports* mean 
exports of Indian Merchandise. ‘Re-exports* mean exixuts of 
foreign merchandise previously imported in India. 



714 An IftTBODUCTSON TO 8TAX1BTICAJU METHODS 

Dikiis ofStatistic** The Statistics in this publication relate 
to the quantity and value of commodities in trade with foreign 
countries. 

(a) Quantities. The figures of quantity are based on the 
declaration made by importers on Bills of Entry and by expor* 
tars on Shipping Bills, and subsequently checked by t he Customs 
Officers. The figures represent net weight exclusive of packing* 

(h) Values. The values are those adopted by the Customs 
authorities for their purposes and are based on wholesale 
market prkcs and represent the wholesale cash price for which 
goods of the like kind and quality arc sold or arc capable of 
being sold at tbe time and place of importation or exportation, 
as the case may be, without any deduction except (in the case 
of goods imported) of the amount of the duties payable on the 
importation. 

Countries to which Imports and Exports are Credited, (a) Imports 
are classified as received from the countries of consignment, 
via., countries from which goods have come whether by land, 
sea or air without interruption of transit save in the course of 
transhipment or transfer from one means of conveyance to 
another. The countries of consignment are not in all cases 
countries where the goods have been produced. 

(k) Exports are credited to the country of final destination, 
he., the country to which goods are intended to pass without 
interruption of transit save in the course of transhipment from 
one means of conveyance to another. 

The list of countries adopted for recording India's trade 
is based on the international Convention relating to Economic 
Statistics. 

The Imports on Government Account. Owing to the ‘Note 
Pass* procedure obtaining in Custom Houses for the Clearance 
of Government imports, there is a delay in the receipt in this 
Depaitment of the full particulars. The figures are included 
in the progressive totals upon receipt, meanwhile import figures 
are deficient to that extent. 

(2) Supplement to Monthly Statistics of the Foreign Trade of India* 
The “supplement’ provides information relating to the following: 



wmw statistic® 718 

(a) Value of foreign trade, 

(b) Overall balance of trade, 

■(f) Foreign trade of Customs Zone*, 

(d) Foreign trade with each country and Currency Area, 

(e) Foreign trade in group* of commodities with each 
Currency Area, 

(/) Index Numbers, 

(g) V alue of principal articles of export and import, 

(A) Foreign trade in treasure, and 

(») Foreign trade with selected countries. 

Internal Trade 

(i) Coastal Trade. Statistic* of our Coastal trade are 
published in ‘Accounts relating to Coastal Trade and Naviga¬ 
tion of India*. 

The Department of Commercial Intelligence and Statistic* 
issues this publication monthly. It contains information regard* 
mg coastal trade as registered in Indian ports. Data are 
compiled from the shipping bills that arc checked by the cus~ 
toms house. Both internal and external trades are considered 
in these accounts. Here internal trade mean* trade between 
ports within the same state and external trade signifies the trade 
between two different statistic*. 

(ii) Rmd f Rail and River-barne Trades. In India official 
statist its with regard to Road Trade are not available. The 
data relating to Rail and River-bo me Trade are given in 
‘Accounts relating to the Inland (Rail and River*borne) Trade 
of India*. It is a monthly publication of the Department of 
Commercial Intelligence and Statistics. These Account* exhibit 
the quantity figures of the trade in selected commodities moving 
by rail and inland steamer. 

For the purpose of these Accounts, India has been divided 
into 36 Trade Blocks. 

The basic documents for the statistics in this publication 
are the invoices relating to consignments Of the selected com¬ 
modities received at each railway and steamer station from 
Trade Block* other than the one in which it is situated. Consign- 



716 ah ifmomjcnoK to btatutical methods 

meats rece iv ed at t station from stations within the same Trade 
Block and consignments received at the station enroute to 
another station whether within or without the same Trade 
Bloch, are not taken into account. 

Each Railway and Steamer Company consolidate* the 
figures in respect of the stations with which it is concerned, 
and submits monthly returns to the Department of Commercial 
Intelligence and Statistics. 

The figures in these accounts are the figures of the actual 
imports into each Trade Block. No attempt is made to collect 
the export figures as, in inland trade, the total exports of one 
Trade Block are represented by the sum of the imports from 
that Trade Block into all other Trade Blocks. 

In the case of River-borne trade, the statistics cover 
the trade carried by certain steamer companies between a few 
Trade Blocks only. These are:(I) Calcutta, (2) Assam, (3)West 
Bengal (excluding Calcutta), (4) Bihar, and (5) Uuar Pradesh. 

Tki Mian Trad* Journal : it is a weekly publication of the 
Department of Commercial Intelligence and Statistics. It 
contains the following information : 

(i) Weekly figures of exports and imports of selected com¬ 
modities—the data being collected from shipping bills. 

(it) Weekly despatches and arrivals of certain staple 
commodities at selected centres. Figures published 
are taken from reports which are given by the railways 
and steamer authorities. 

(«i) Monthly foreign sea-borne trade of India. It is in 
a summary from giving figures of exports and imports, 
private merchandise and visible items of trade. 

Mm Cation Trad* Statistics. It is a monthly journal and 
supplies data on imports of raw cotton as well as its exports 
from India. It contains the data on imports for each state. 

Statistical Abstract ( Anmat ). The Department of Commercial 
Intelligence and Statistics published the Statistical Abstracts up 
to 1941*43. During the war, its publication was discontinued. 
The office of the Economic Adviser attempted to publish it in 
1945*46. After 1951 the Central Statistical Organisation has 



INDIAN STATISTICS 717 

beta publishing the Statistical Abstracts. In this publication this 
statistics of trade are classified under four groups : 

1, Foreign trade by sea and air, 

2. Coastal trade, 

3, Inland trade, and 

4. Land frontiers trade* 

Trade statistics in India are not seriously defective* "H %t 
information obtained from the above journals and accounts 
and various other publications is more or less satisfactory, and 
practically all kinds of details are available* But the informs- 
tion is published after a long time. Due to this time lag between 
the publication year and the year to which it relates, the use¬ 
fulness of the data is partly marred. Moreover, total trade in 
particular commodity is not known because the imports and 
exports on behalf of the Government and private account are 
separated. The classification of goods is not satisfactory* at 
that is not comparable with that necessitated by tariff classi¬ 
fications. 

The National Sample Survey 

The National Sample Survey is a pioneering attempt at the 
application of random sampling for collecting information on 
a wide range of subjects covering the whole country* The 
absence of reliable statistics relating to production, consumption 
and other aspects of economic and social life in India has been 
known for a long time* But after independence and the conse¬ 
quent assumption of wide economic and social responsibilities 
by the Government of India it began to be realised that the 
large gaps that existed in our statistical information were 
serious obstruction in the planned development of the country 
and as such could not be allowed to continue. The NSS 
{National Sample Survey ) was undertaken to fill up these gaps* 
An abstract scheme for organising a National Sample Survey 
was approved in December 1940 by the Finance Minister and 
a Directorate of National Sample Survey was set up under the 
Ministry of Finance. 



Til in llit«01WCriOK TO mTtfTICAI* METHODS 

Tine work of collecting statistical data under this scheme 
Matted on 1st October, 1950 and the first round was completed 
in March 1951, Since then the NSS is functioning on a conti¬ 
nuing basis. The surveys are arranged in a number of succes¬ 
sive rounds per year, each round comprising a period of three 
months or more. Besides several specialised studies seven 
rounds of survey have been completed by now. The reports 
are available only for the first three rounds. 

The NSS cover the whole of India with the exception of 
Jammu and Kashmir, Andaman and Nicobar Islands. The 
programme is comprehensive but is changed as necessary from 
one round to another. The information on economic and 
social condition* is collected by the interview method. The 
investigators visit the households included in the sample and 
make direct inquiries from the householders. In the case of 
crops and other items the information is collected by the 
investigators by their own personal observations. The field 
staff works mostly under the direct control of the Ministry of 
Finance, The statistical work (including the sample design 
processing of data, and the preparation of the reports) is done 
in the Indian Statistical Institute. 

Tht Pint Round The plan of the NSS was drawn with the 
idea that repeated investigations, one round after another, will 
be carried out every year in a continuous maimer. The first 
round was started in October 1950 and completed in March 
1951, A sample of 1,835 village# scattered all over live country 
was selected for investigation. The sample was divided into 
two groups of villages each of which was scattered throughout 
the country, and two different sets of schedules were used in the 
two groups. One set of schedules was prepared in the Indian 
Statistical Institute and was applied in ihr first group of 1,189 
villages, and the second set of schedules was prepared by the 
Cokhale Institute and was used in the second group. The 
Report of the First Round gives information collected in the first 
group of villages to which the Calcutta schedules were applied. 

The Schedules applied to this group of 1,189 sample villages 
may he classified into four types : 



UTOIATf STATISTICS 719 

(1) Village Schedules. Theie were used lor listing it! 
households of sample villages ; tor collecting information on 
land utilisation ; for collecting prices of selected commodities ; 
and for collecting rates of skilled and unskilled workers. 

(u) Hmtktii Schedules (/). These were used for collecting 
genera) particulars on demographic and economic conditions 
such as age, sea, marital status, economic and employment status. 

(m) Household Schedule .r (If). These were used lor a 
smaller number of sample households for collecting detailed 
information on household enterprises and activities such as agri¬ 
culture, industry, crafts, trade services and profession. 

(in) Household Schedules (it!). These were used for a 
small number of sample households for collecting detailed in* 
formation on consumption in value and in quantities of food, 
beverages, fuel and light, rent, clothing and various other items. 

The sample units were selected in two stages, namely: 
(i) Select ion of villages, and (it) Selection of households. 

For the purpose of the selection of villages the entire country 
was divided info 256 strata. The allocation of sample villages 
in each stratum was in proportion to the rural population of 
the stratum. The different phases of sampling in the selection 
of the households were : 

(!) All the households in a sample village were listed. 

(2) A random sample of 80 households was selected for 
collecting information relating to occupation, 

(3) These 80 households were then classified into two 
groups, namely, agricultural and non-agricultural on the basis 
of the particulars collected. 

(4) Out of the agricultural households, 8 were selected at 
random and in the same manner 8 were selected from non* 
agricultural households. From these 16 schedules general 
particulars schedule was completed. 

(5) Two of the 8 agricultural households and 3 of the 8 
non-agricultural households were then selected at random to 
obtain details of household enterprises. 

(6) Of the remaining 6 agricultural households, 1 was 
selected and of the remaining 5 non-agrkultural 2 were selected 



720 Alt IlfTEO0UCTIO!f TO STATISTICAL METHODS 

far obtaining detailed information on consumer expenditure. 

Th$ mmd mad of survey war started in April 1991 and was 
completed at the end of July 1951. This round also covered 
the rural areas only and was concerned mainly with data on 
final expenditure. As in the first round the sampling was done 
in two stages. In the first stage, 1,160 sample villages were 
•elected at random from out of 240 starts. In the second stage 
a number of households were selected (without further stratifi¬ 
cation) from sample villages. 

Th$ third found of the survey was started in August 1951 and 
was completed in November 1951. Two important changes 
were made in this round. First, the survey was extended to 
cover the urban areas also. Secondly, the scope of this survey 
was further extended to cover information not only on consump¬ 
tion but also on production, income and costs in household 
industries, trading and services. 

Fourth Found. The fourth round was started in April 1952 
and was completed in September 1952. In this round the 
design for the urban areas remained broadly the same as in the 
third round hut the design for the rural area was completely 
changer). In addition to the usual information, in this round, 
information about factories was also collected. The factories 
covered were those employing 10 or more persons and using 
power [Sec. 2 m (i) of the Factories Act) and employing 20 or 
more persons but not using power [Sec. 2 m (it) of the Factories 
Art). A sample size of 32,766 factories was selected and out 
of these 26,660 were those covered under Sec. 2 m (I) and 6,086 
under Sec. 2 fit (ii) of the Factories Act. 

The information collected related to (i) tlie value of fixed 
capital, (ii) value of working capital, (ill) value and quantity 
of fuel, raw materials, chemicals etc. consumed, inclusive of 
services rendered from other sources, (»v) value and quantity 
of products and by-products of the factory and services 
rendered to customers, (v) duration of working period 

(vi) employment—showing wages and salaries paid, and 

(vii) rent ot fixed assets secured on lease. 



INDIA!* STATISTICS 721 

The scope of the survey was further extended to. cover 
establishments registered or licensed under the Industries 
(Development and Regulation) Act of 1951 as amended from 
time to time. 

The geographical coverage of the 4th round of the survey 
remained die same as in the 3rd round, i.e„. the whole of India 
excluding Jammu and Kashmir Andaman and Nicobar Islands 
and some parts of Assam. 

Ftfth Round In the fifth round was also included a survey 
of industrial production which covered all household and non* 
household enterprises in both rural and urban areas other than 
those employing a minimum of 10 or more workers with powder 
or 2.0 or more workers without jx)wrr on any day in the year. 

Sixth Round. The Sixth Round commenced in May 1953. 
It sought to collect comprehensive information on a wide range 
of subjects such as births, deaths, consumer expenditure, other 
household expenditure, agricultural implements, livestock, 
poultry, manufacturing establishments, small scale industry and 
handicrafts, transport, trade, profession and services etc. 

Seventh Round. In this round the same information was 
collected as in other rounds, but the scope of this round was 
more intensified. 

Eighth Round (July 1954—March 1955), The eighth round 
of the NSS started in July 1954 and continued up to March 
1955. In the Eighth Round of the NSS information was 
collected as in previous rounds, on household consumption and 
production. In addition, an extensive survey was made of 
land holdings by including a large number of questions broad* 
ly on the lines of the agricultural survey recommended by the 
Food and Agricultural Organisation (FAD). Emphasis was laid 
on collection of data concerning ownership and operational 
holdings. The other topics taken up for enquiry for the first 
time, at the instance of Central and State Governments were 
(1) trend of self-management of agricultural holdings, (2) house* 
hold indebtedness, and (3) fanning practices. 

One very important development in this round of survey 
46 



722 


AH mTRODUCTfON TO STATISTICAL METHODS 


was the participation for the tint time of the various State 
Governments in the work of the National Sample Survey. 

The Survey covered the whole of India excluding the 
Andaman and Nicobar Islands and Sikkim. The urban area* 
were covered by the Central Sample only. The State of Jammu 
and Kashmir was Included for the first time since the inception 
of the National Sample Survey. The Ladakh district of Jammu 
and Kashmir could not, however, be surveyed as it was not 
possible to contact the sample villages. 

Ninth Round of (h Nation*! Sample Survey* The ninth round 
of the National Sample Survey started in May 1955 and conti¬ 
nued up to November 1955. In the ninth round of the 
survey, for the first time, data was collected regarding the 
extent and the character of employment and unemployment 
for both the urban and rural areas. In addition to this, the 
usual enquiries relating to household consumer expenditure and 
productive enterprises of households, prices etc. were conducted. 

The survey covered the whole of the Indian Union includ¬ 
ing Jammu and Kashmir but excluding Sikkim and the 
Andaman and Nicobar Islands. 

In spite of the limitations and difficulties inherent in the 
conduct of a large scale sample survey in India on a subject 
like employment and unemployment, this survey has made 
available in quantitative terms, a fairly comprehensive picture 
of both rural and urban employment and unemployment in the 
country at a whole. 

Tenth Round (Dec. 1955—May 1956). The tenth round of 
the NSS started in the middle of December 1955 and continued 
up to the end of May 1956. The NSS in its tenth round, attempt¬ 
ed to collect acreage statistics for important rabi crops by means 
of a somewhat If rge scale exploratory survey. Crop-cutting 
experiments were introduced for the purpose of training field 
investigators in the technique of collection of data on yield 
rates of important cereals and pulse crops. 

In the tenth round of N$S* villages for the purpose of surveys 
were selected under two schemes. In the first of these, 1.624 



mOlAM STATISTICS 728 

village* were sampled for both socio-economic and tend 
utilisation surveys* In the second scheme a further set of 3,260 
villages were sampled for land utilisation survey only. Out 
of this latter set, 1,956 villages were selected for crop-cutting 
experiments and were planned to be surveyed during the later 
part of the round. 

Eleventh atid Twelfth Rounds of the National Sample Sump 
(August 1956— August 1957). The Eleventh and Twelfth Rounds 
of the NSS started from August 1956 and continued up to 
August 1957. The National Sample Survey in its eleventh 
and twelfth rounds undertook, in addition to its regular 
enquiries, an all-India survey to collect information on 
(i) wages, employment, income, expenditure and indebtedness 
of agricultural labour households, (ii) housing conditions in 
rural areas, and (iii) internal migration. 

The whole of India with the exception of Islands—Anda¬ 
man, Nicobar, Amindivi and Laccadive—was covered in the 
eleventh and twelfth rounds of rhe National Sample Survey. 
Through the Survey was divided into two rounds, it was 
designed as a single continuous survey. In alt, 3,696 sample 
villages were included in this survey. Information was collected 
by interviewing the sample households in the selected villages. 

Fourteenth Round of the National Sample Survey (July 1958— 
June 1959). The fourteenth round of the NSS started in 
July 1958 and it continued til) June 1959. The Survey was 
carried out in the whole of rural India except in the Andaman 
and Nicobar Islands, North East Frontier Agency, Naga Hills 
and Ltithai Hills districts in Assam. 

The National Sample Survey in its fourteenth round conti¬ 
nued the collection of data on area and yield of principal cereal 
oops in India as in previous rounds. The main objective of 
the survey, which was planned in the light of the experiences 
gained in earlier rounds, was to obtain an all-India estimate 
of production of major cereal crops taken together with a 
reasonable margin of error. 

In addition to this, information was also collected regarding 



124 AN INTRODUCTION TO statistical methods 

the standard of living of the working and the middle class, and 
the Birth and Death Rate* and the Rate of Growth of 
Population. 

Fifteenth Round pf the National Sample Survey (July 1959— 
June 1960). The fifteenth round of the NSS started in 
July 1959 and it continued up to t he end of June I960. 

A* in previous rounds the NSS collected data on area and 
yield of principal cereal crops in it* fifteenth round in order to 
obtain an all-India estimate of production of major cereal 
crops (i,e., rice, jowar, bazra, Tagi, maize, wheat and barley) 
taken together with a reasonable margin of error. 

National Income 

Of the many spheres in which economic statistician* help 
the economic theorists and policy framers, the field of National 
Income is the most important and at the same time most 
complicated. National Income statistics have been collected 
since long but the emphasis on them has girally increased since 
the state has undertaken to promote the economic welfare of its 
people. National income estimates provide a broad view of 
a country's entire economy, as well as of its various compo¬ 
nents, and over sx number of years they reveal the change in its 
economy. 

Among the various problems that confront the statistician, 
the initial problem is the problem of definition. As the aggre¬ 
gate economy can be described from a number of aspects the 
concept of National Income can alio be variously defined. 
The method of estimation would depend upon the way in 
which we define National Income which in turn would depend 
upon the purpose for which National Income estimates are 
required. The definition should, however, be theoretically 
scientific and at the same time provide a statistical approach. 

Now if the purpose of the study of National Income is to 
examine the distribution of personal incomes or the ways in 
which individuals divide their incomes between spending and 
saving, then National Income can be defined as the sum of 



JJfDIAH STATISTICS 72$ 

All personal income before taxation. If the purpose is to 
study the relation between the rate of taxation and the yield 
of revenues* national income should include all personal 
incomes and the undistributed profits If the purpose is to 
examine the standard of living, national income can be defined 
as the money value of all goods and services produced in a 
country, during a period of time. According to the report of 
Bowley* Robertson Committee national income is defined in the 
following words: “The national income is the money measure of 
the aggregate of goods and services accruing to the inhabitants 
of a country during a year, including nett increment to or cxdud* 
ing nett decrements from their individual or collective wealth/* 

A distinction has to be drawn between national income at 
factor cost and national income at factor prices. National 
Income, when it is defined as the income accruing from current 
production of goods and services, is called national income at 
factor cost. Factor costs represent what producers receive for 
their products and factor prices represent what consumers pay 
for them. The difference between the two is the nett amount 
of indirect taxes or similar charges which the state obtains from 
the market value before it goes to the producers. Indirect taxes, 
excise duties on beer and tobacco are included in the market 
price which the consumers pay for the goods and services in 
question. The nett national income at factor cost gives the 
income actually received by the factors of production. Total 
market value of goods and services produced is estimated by 
adding the total of indirect taxes and depreciation to the nett 
national income at factor costs. 

There are three traditional methods of measuring the 
national income. They' are : 

(i) The Income Method, 

(ii) The Inventory Method or Production Method, and 

(iii} The expenditure Method. 

Triple Entry Balance Account takes into consideration all 
the three above named methods, and offers a check of om 
method upon another. 



726 AN UftSODtCrtOH TO statistical methods 

(i) The Income Method, The National Income can be regar* 
ded ai the turn of the incomes obtained from economic activity 
by the country** inhabitants. This would record the distribu- 
tioh of income among various kinds of income receivers in the 
form of rents, profits, interests, wages and salaries. The data 
for estimating national income by this method are obtained 
from income-tax statistics, wage-statistics, etc. 

(it) The Inventory Method. National Income can be measured 
in terms of the nation*! output of goods and services to give u« 
the nett national output. The nett value of each industry’s 
output of goods and services is found out. Nett value is equal 
to the total selling value of output minus (a) the value of 
those goods and services, which are brought from other 
industries or from abroad, and (b) depreciation. Under this 
method national income is grouped not according to the people 
who earn it, but according to the industries in which the 
income is earned. The data corresponding to this method 
are available in the materials of a census of production. 

(in) The Expenditure Method . National Income can also 
be measured through the channels of its expenditure. Nett 
national income consists of the sum of two sets of activities, 
concerned with the ultimate disposal of income, namely, 
expenditure on consumer goods and expenditure on investments 
goods. The information for the estimation of income through 
this method can be found in financial and foreign sales transac¬ 
tions, family budget inquiries, and retail and wholesale prices. 

The difficulty of using these traditional methods lies in the 
fact tha t these methods are not alternative methods of estima¬ 
ting national income, but they complement each other. But in 
the use of Triple Entry Balance Account (T.E B.A.) every 
income generating activity is entered from three distinct sets 
of basic data. In so far as each of these approaches sets out 
the same national income each of these totals constitutes a 
check on the other. Many of the constituent details can be 
chocked separately with items or combination of items derived 
from alternative approaches. 



INDIA!* STATISTICS 


727 


... J5S5* 

I* Wages I. Agricultural 1. Expenditure on 

2. Profit* 2. Livestock current coniump* 

3, Interests 3. Mining tion of goods and 

4, Rents 4. Manufacture services 

5. Salaries 5. Distribution services 2, Nett investment. 


T'E'UL 


.. 

I. Agricultural 


L*7T 




Nett national 


6. Transport etc. 

Nett national 
output 


Nett national 
expenditure 


The T.E B A. supplies most of the informations normally 
required for the formation and computation of economic policy. 
The act of drawing a T.E.B.A. tends to reduce the obscurities 
of definition and danger* of double counting which may arise 
in case national income is computed from one point of view. 

National Income and its Estimation . A national income estimate 
measures the volume of commodifies and services turned out 
during a given period. For purposes of the estimate all econo* 
roic activity, whether it be the production of shoes or of 
airplanes, services in the way of medical care or provision of 
justice, are considered and the volume of their output measured. 
Since these activities are of varied types, w> single basis 
can be used for measuring their output arid as such the various 
producers are classified into different group*. For each of these 
producing groups the estimates gauge the nett product which 
their activities yield. Thus the estimate of national income 
provides not only the figure of total national dividend but also 
its distribution over the various groups of income recipients 
(both by the type of productive factor which (heir income 
represent and by the size of their income and also between 
spenders, savers, and tax*payers). 

Just as national income estimates distinguish various groups 
of producers and income recipients they also try to cover fully 
the several economic functions, such as Production, Distribu¬ 
tion and Expenditure. “In the circuit flow of economic activity 
Site same total income can be measured at the point of produc¬ 
tion, as a sum of nett outputs arising in the several industrial 



728 ah mrtomvettQti to statistical methods 

sectors of the nation’s productive system; at the point of flow 
of incomes, as the sum of ail incomes in cash, in kind and 
retained by enterprises as nett profit, at the point of final 
utilisation, as the sum of consumer expenditures, government 
purchases of goods and nett outlay on capital goods/ 1 

Dijfuulhis in Estimation, The task of making these estimates 
is beset with many difficulties. There is always a danger of 
the same commodity being counted twice, and as such special 
cate is needed to avoid the double counting involved in inclu¬ 
ding both the raw material and the commodity into which the 
raw material enters. This problem becomes still more intricate 
when one is to decide about that part of the Government’s 
general administration which is service to the business and thus 
included into the value of its product, and that which is service 
to the people as individuals. Such a decision becomes impera¬ 
tive inasmuch as the former is not and the latter is to be 
counted for purposes of national estimates. 

Again, as the National Income Committee Report puts it, 
“the reduction of the numerous economic activities of the 
millions of people in a country to a common denominator that 
permits quantitative measurements is clearly beset with intel¬ 
lectual difficulties. How does one add together the services of 
a street sweeper with those of the Prime Minister, the product 
of a village carpenter with that of a steel mill.” 

India's National Income 

The first pioneer effort in the computation of national 
income of our country was made by Dr, V. K* R. V. Ran 
when He calculated the national income for 1931-32. The first 
systematic effort in this direction was made by the Government 
of India in the year 1949 when it appointed a committee to 
report on the country's national income. A national income 
unit was also created by the Government to compile authorita¬ 
tive eitimates for the national income, 

DMMMftS of EaUraottaa in Ixtdia 

Besides the difficulties mentioned earlier there are more 
serious problem* of estimating national income which are 



fnoun sTATi»Tica 729 

rather peculiar to India. In this connection the committee 
makes a mention of the following : 

(i) There is no uniform basis r>f evaluating commodities 
and services in terms of money, for a considerable portion of 
our output does not come to the market at all and is either 
being consumed by the producer himself or is bartered for other 
commodities and services. This might necessitate, as the com¬ 
mittee remarks, the inclusion in the estimates of India's National 
Income of a classification of ‘monetary' and 'non-inonetary' 
sectors which one would not find in national income estimates 
of other countries. 

(ii) The problem of measurability is further aggravated by 
the fact that the Indian producers do not generally maintain 
any reliable records either of the quantity or of the value of 
their output. This means that an important source for the 
collection of reliable relevant economic statistics available in 
western countries, is denied to the investigators here. This is 
due, apart from the comparative dearth of the technical per¬ 
sonnel needed for the purpose, to the illiteracy of the majority 
of population, the semi-subsistence character of their economic 
activity and the general absence of the practice of keeping 
accounts 

(tit) There is a comparative lack of different tat ion in 
economic functioning. Household enterprises, which constitute 
a major portion of our economy, simultaneously and without 
differentiation, perform functions which would normally fall 
under different industrial categories. This means that the 
customary industrial classification cannot be adopted with the 
same exactitude in our country as in the countries of the west. 

(tv) Then there is, as if to crown all others, another diffi¬ 
culty, vie., ‘the non-availability of adequate empirical data'. 
There are scarcely any current data on the economic structure 
of the basic industry of the country'—agriculture and related 
actmties~~i>o information on the structure of costs, on consumer 
expenditures of the population attached to land or on their 
savings, If any. Neither are there any recent or sufficiently 
comprehensive data on the consumption, expenditure or savings 



730 a* innotwcnon m statistical methods 

of the urban population. There it alto no information at 
regard# distribution of income by me nor any data which 
might permit an etthnate of capital formation. 

Mrt ko ds frtbwtd /or Estimation* There are two important 
methods of cakulating the National Income ; (i) The census or 
inventory method, and (ii) the income method. The first 
contitu in the evaluation of the goods and sendees accruing and 
the tecotid it a summation of the individual incomes. The 
inventory method can be used only if there exists a comprehen¬ 
sive census of production; the use of income method depends on 
the existence of (i) Income-tax statistics, and (it) an adequate 
wage census. 

The broad pattern of the method followed by the committee 
has its basts in the methods adopted by Dr. Rao, and has been 
governed by the availability of the data. Since it was not possible 
to use cither the ‘inventory* or ‘income 1 method to cover the 
entire range of the economy, the two methods have been 
combined, resulting in what may be called a Inventory*cum* 
Income* method of calculation. 

The total working-force for 1948-49 and its distribution 
among different occupations have been determined. This 
occupational classification is made on the basis of the classifi¬ 
cation of the economy, by industry (industry in this context 
also includes agriculture, services and all other means of income 
generation). The inventory method i* applied to the following 
sectors : 

(i) Exploitation of animals and vegetation, 

(ii) Exploitation of minerals, 

(iii) Industry. 

The income method is adopted for the remaining sectors of 
our economy, i.e., 

(!) Transport, 

(ii) Trade, 

(iii) Public force, 

(tv) Public administration, 

(v) Professional and liberal arts, 

';*vi) Domestic services. 



irvotAtf statistics 731 

Tike sum of she nett product of ell the above sectors of our 
economy gives the nett domestic product at ‘factor costs 1 . This, 
in other wonts, represents the income as received by the various 
factors of production, e,g., labour, capital, etc. Adjusting this 
figure with the estimated earned income from abroad the 
national income for 1948*49 is derived. 

The committee estimated India's National Income during 
1948*49 at Rs. 87*1 abja or Rs 8710 crores, and taking the 
population at 341*04 millions, have calculated that the average 
individual income is about Ra 255* 

The report contains some interesting breakdowns of this 
estimated National Income. Its analysis by industrial origin 
indicates that 47‘6 per cent of the total national income xs 
contributed by agriculture which includes animal husbandry, 
processing, marketing and ancilUary activities performed by the 
cultivator, forestry and fishing; 17 2 per cent from mining, 
manufacturing and hand-trades ; 19*5 per cent from commerce, 
transport and communication; and 15*9 per cent from miscel¬ 
laneous groups which includes professions and administrative 
services. Broken according to the character enterprise, 613 
per cent comes from small enterprises largely household pro¬ 
duction, and only 12 per cent from larger enterprises, factory 
establishment contribution three-fourths of this amount. 

Information has also been supplied regarding the nett output 
per engaged person in various occupations, which is as follows: 



Nett output per Engaged Person 

-rT 

L 

Agriculture 

500 

2. 

Mining and factory 



establishment 

1,700 

3. 

Small enterprises 

600 

4. 

Railway and communication 

1,900 

5. 

Bank, Insurance etc 

1,500 

6. 

Professions and arts 

600 

7, 

Domestic service 

*90 

8. 

Government service 

1*300 


In arriving at the figure of national income and other 
related estimates the committee has made use of a number 
of expedients, assumption* and guesses in order to overcome 
the many gap* in the statistical field and to extend its cover* 




712 AN INTRODUCTION TO STATISTICAL MET BODS 

Age lo the whole of the country and to all the sections of its 
economy. At such, too much emphasis cannot be laid on the 
provisional nature of these estimates, and no reliable con¬ 
clusions can be drawn on the basis of these figures. Quite 
apart from these considerations, the utility of these estimates 
is further marred by their isolation from the past. The 
assessment of any nation's income for a particular year does 
not mean very much, standing by itself; the real value of the 
technique is in revealing shifts and changes in the economic 
structure over a period of years. 

Estimates of National Income are now available in a con¬ 
tinuous manner from 1948-49 onwards. Besides, the great 
usefulness of such measures, even if relating to only a single 
year, for any considerations of questions of economic policy is 
obvious enough, particularly if one remembers that in the 
process of arriving at such measures, a great deal of related 
tnfortnatton (e g., on the number of workers, on the geogra¬ 
phical distribution within the country, on gross income expenses 
etc*) is usually secured, 

A study of the figures of national income and their com¬ 
parison with Dr. Rao’t estimates for 1931-32 indicates that 
India in 1948-49 was not richer than in 1931-32 ‘in real terms’. 
Such a comparison has not been attempted by the com¬ 
mittee (perhaps because of a difference in the technique of 
estimation). But wc have the authority of Dr. Rao himself 
for such a comparison, who, in a press conference soon after 
the release of the report, pointed out that his estimate of 
per capita income in 1931-32 was Rs. 65 which would come to 
about R». 260 when adjusted to increased prices and the inllated 
values of other contributory factors obtaining in 1948-49. This 
indicates that the per capita income has not gone up. 

Figures showing the distribution of national income by 
industrial origin confirm the predominance of agriculture in 
the national economy which contributes nearly 48 per cent 
of the total national income* The relati ve importance of the 
commodity sector is further emphasised by the point that com¬ 
modity production represents two-thirds of the total income* 



INDIAN STATISTIC* 75$ 

When these estimates ire regrouped according t m the 
size of the enterprise, the relative importance of small enter* 
prise (largely household) in our economy becomes very 
clear. It would have been more informative if this division were 
pursued further within important industries, since the relative 
shares of small and large enterpr ises in any one field is a matter 
of interest in devising priorities and allocating resources. 

The estimates dealing with nett output per engaged person 
provide a still more interesting study, These range from 
Rs. 400 for domestic service to Rs, 1,900 for railways and 
communications. The committee has, however, emphasised 
that too much should not be read into these figures which 
represent neither the productivity per worker of each sector 
(since, for example, non-working proprietors have been inclu* 
ded) nor the average earnings per engaged person (since In* 
come payments like interest may conceivably go to persons 
outside the sector). However, bearing in mind that the 
capital investment in the way of mechanical equipment per 
worker varies widely from sector to sector, the figures give a 
comparative picture of the share per person in the different 
sectors. High nett output in government service explains the 
desire of the educated youngmen for government services ; in 
the same way low output in domestic services can be taken as 
the cause of a dearth of domestic servants. 

Combining the nett output per worker figures with the 
percentages of contribution of the various industrial sectors 
to the total national income we find that India get* nearly 48 
per cent of her income from an occupation where the nett out* 
put per worker is among the lowest and that she derives only 
73 per cent from mining and manufacture where output per 
worker is among the highest. This explains the relative poverty 
of the Indian people. 

There can be other breakdown of national income which 
may be of great importance in economic analysis, such as 
allocation of the national product between urban and rural 
sectors, consumer expenditure, private saving and capital 
formation, but which the committee was unable to undertake 
for lack of statistical information of adequate coverage or 
sufficient reliability* So long as tbit deficiency continues the 



7M ah iirr®OD 0 €Ttori to statistical methods 

relevance attach estimates to policy making will not be fully 
appreciated. It is, however, expected that In the near future 
a good deal of new statistical data will be available and by 
the time the final report is submitted a sound foundation will 
have been laid for the national income estimation in India. 
Ills, therefore, not by the information contained in the report 
that we are to judge its importance but by the fact that a 
proem of systematic collection of data and its scientific tabula* 
lion and interpretation hat been set in motion by the com¬ 
mittee's labours. This report has made a beginning of a 
useful vein of inquiry which will be followed along careful 
and comprehensive lines, and it can be confidently predicted 
that before long our national income estimates could be retied 
upon for purposes of economic policy much to our benefit. 
Oflleftat Estimates of National Income by the N. L U. 

The National Income Unit (NIU) of the CSO prepares the 
national income estimates every year and publishes a paper 
giving the corresponding figures of the previous years. The 
NIU* in estimating the national income, adopts two methods— 
the income method, and the production method ; the income 
method fox the following sectors: (i) commerce* transport and 
communications including banking, and (ii) other services, 
including professions and liberal arts, government services 
domestic service and house property ; and the production 
method for the remaining sectors of the economy, i.e., (») agri¬ 
culture, and animal husbandry including forestry and fishery, 
and (ii) mining and manufacturing including factory establish¬ 
ments and small enterprises. 

The sum of the above items gives the nett domestic product 
at factor cost and to this is added the nett earned income from 
abroad and the total yields the nett national output at factor 
cost, he , the national income. 

The latest report published in 1964 gives the estimates of 
national income for the years 1948-49 to 1962-63. The tech¬ 
nique adopted in estimating the national income is broadly the 
same as adopted by the National Income Committee (NIC). 
Table 23,1 gives a comparison of movement of Nett National 
Output (at factor cost) at current and constant prices for the 
period 1948-49 to 1962-63. 



CoaifMrliM «f movemait of Nett Nadoaal Output (At factor Cost) 


INDIA IV STATISTICS 


735 



6 »- 9 » 6 t 

0S*6H»I 

IS*0S61 

Z9-156I 

££*2S6! 
W-ES6I 
C5-K6I 
9S*€S6I 
• 4S-9S6I 

85*£«6t 

6S*856I 

09*6561 

19*0961 

29*1961 

•£9*2961 




73<S AW INTRODUCTION TO STATISTICAL METHODS 

Table 23*2 gives the national income by industrial origin at 
current prices for the following sectors: (i) agriculture, 
(ii) mining, manufacturing and small enterprises, (HI) com¬ 
merce, transport and communication, and (»v) other services 
for the period 1940-49 to 1962-63, 


TABLE 23,2 


(cumnt pruts in Rs. nbja) 



OJ 

$ 

1960-61 

* 

A 

$ 

1 

(«) 

(2) 

(3) 

(4) 

(5) 

Agriculture 

l . Agriculture, animal 
husbandry and 

ancillary activities 

678 

66-8 

43*9 

47-8 

2- Forestry 

1-2 

11 

0-7 

0-7 

3. Fishery 

0*7 

1*0 

0*6 

0*4 

4. Total of agriculture 

697 

68-9 

45*2 

48-9 

Mining, manufacturing and 
small enterprises 

5, Mining 

2*0 

1*6 

1*0 

0 7 

6. Factory establishments 16 9 

13*2 

7'8 

5’5 

7. Small enterprises 

12 1 

n-2 

9-7 

91 

8. Total of mining, manu¬ 
facturing and small 
enterprises 

31-0 

26-0 

183 

15*3 

Commerce, transport and 
communication 

9. Communication (post, 
telegraph and 
telephone) 

0*B 

0-6 

0-5 

0*4 

10. Railways 

4*3 

3-6 

2-5 

1*8 


* PrfUminarv. 






INDIAN STATISTICS 757 


(I) 

(2) 

(3) 

(♦) 

(3) 

1 1, Otgnniaed banking and 
iamrance 2'1 

1-6 

0*9 

0*7 

12. Other commerce and 
transport 

19-0 

17 6 

149 

14*0 

IS. Total of commerce, 
transport and com¬ 
munication 

26*2 

234 

18*8 

169 

Other services 

14. Professions and 
liberal arts 

8*4 

7*4 

56 

4*7 

15. Government services 
(administration) 

11-7 

90 

57 

43 

16. Domestic service 

2*2 

1*9 

1*4 

13 

17. House property 

56 

5*3 

46 

41 

18. Total of other Services 27*9 

23*6 

17 3 

14*4 

19. Nett domestic product 
at factor cost 

154*8 

141*9 

99-8 

95-5 

20. Nett earned income 
from abroad 

—0*8 

-05 

00 

-0*2 

21. Nett national output 
at factor cost** 
national income 

154 0 

141*4 

99*8 

95-3 


The above table indicates the importance of agriculture in 
our economy. According to 1962*63 figure!, out of the total 
national income of Ra. 154-0 abja fts, 69*7 abja is derived from 
agriculture, animal husbandry and forestry, lit terms of per* 
ccntage it comes to 45*3%. Whereas mining, manufacturing 
and mall enterprises contribute 20*1 %. commerce, transport 
and communications 17% and other services 161%, and nett 
earned income from abroad—0 5%. 

Table 23.3 gives the relationship between national income 
and some other main aggregates of income and product* 

47 



738 AM INTRODUCTION TO STATISTICAL METHODS 


TABLE 23.$ 

(current pruts in Rs. abjt i) 




s 

$ 



e* 

6 

*A 

6 


* 

g 

2 

*n 

2 

1. Nett domestic product 





at factor cost 

154-8 

1419 

90*8 

95*5 

2. Earned income from 





abroad 

-0-8 

—0*5 

00 

-0 2 

3, Nett national product 





at factor cost 
national income 

154-0 

141-4 

99-8 

95-3 

♦. Indirect taxes inclu* 





ding miscellaneous 
fees 

15*4 

12*5 

7*0 

5*4 

3. Lets subsidies 

6. Nett national product 

-1-0 

-0-6 

— 0*2 

-0*4 

at market price 

168*4 

id3*3 

106*6 

1003 

7. Nett domestic product 




at factor cost 

8. Less income from 

154‘8 

141*9 

998 

955 

domestic product 
accruing to 





government 

—1*4 

-13 

-0*7 

-0*7 

9. Income from domestic 




product accruing to 
private sector 

153-4 

140*6 

99 l 

948 

10. National debt interest 

11. Earned income from 

1-7 

1*2 

0 5 

04 

abroad 

-08 

—0*5 

00 

-0*2 

12* Transfer payments 

2*4 

2 1 

13 

0*6 

IS. Nett private donations 




from abroad 

0*3 

0*3 

0*4 

CM 

14. Private income 

157 0 

1437 

101*3 

96*0 




INDIAN STATISTICS 739 

Tabic 23.4 shows the nett national output at 1948*49 prices 
suiorwut. 

TABLE 23.4 

Nett National Ontpnt at >948*49 prices (aactorwian) 


(in Hs , abja) 



1962*63* 

1960-61 

1955-56 

1950*51 

1. 

Agriculture, animal 






husbandry and 

ancillary activities 

58D 

59*0 

M)-2 

43 4 

2. 

3. 

Mining, manu¬ 

facturing and small 
enterprises 
Commerce, trans- 

23 1 

211 

176 

14 8 


port and communi¬ 
cations 

264 

24*6 

19*7 

166 

4. 

5. 

Other services (b) 

Nett domestic product 

27-0 

23*1 

17*3 

13*9 


at factor cost 

134*5 

127 8 

1048 

88*7 

6 . 

Nett earned income 






from abroad 

-0-8 

“05 

00 

— 0*2 

7* 

Nett national output 






at factor cost *-* 
national income 

133 7 

1273 

1048 

885 


If we just study the contribution of the agricultural sector 
to tbe national income, the above table shows that in absolute 
terms it is going up gradually. But in terms of percentage, 
if we take up the corresponding figures, we find that in 1930-51, 
income from the agricultural sector was 49% at 1948*49 prim, 
fn 1955-56, this percentage came down to 47*9% and again in 
1960*61 it was only 46*4% According to preliminary estimates 
for the year 1962-63, it has further gone down to 43*4%. In 
another sector, mining and manufacture, this percentage Is 
almost constant, although the absolute figures show a tendency 


Mimtiwry. 






740 ah introduction to statistical method? 

to flic. In the other two sectors, commerce, transport and 
communications and other services, the percentage figures show 
a tendency of slight increase. 

Method of Esttnaatioa 

Estimates far Agriculture Forestry, Fishing and Mining. In esit* 
mating the national income from agricultural sector, the N.l.U. 
tool the estimates of quantities and values of crops from the 
figures available through the village patwaris. There is a 
possibility of error in these estimates since they are subject to 
the limitations of agricultural statistics in our country. For 
arriving at the value of livestock and livestock products, the 
N.l.U. used the figures of the livestock census of 1945 and 1951 
and the official marketing reports in that connection. 

Manufacturing and Handicrafts Estimates. For the sector of 
manufacturing and handicrafts, the N.l.U. relied upon the 
annual Census of Manufactures and the sample survey of 
Manufacturing Industries. And for wage statistics, figures were 
taken from the statistics compiled by the Inspector* of Factories. 
Since with regard to small scale industries, adequate statistics 
are not available the N.l.U, has to face difficulties and devise 
methods to obtain the necessary data. Thus in calculating the 
employment figures of small scale industries the N.l.U. took the 
difference between total of persons employed in industries and 
the total of factory employment. 

Estimates of Trade , Finance and Transportation. In calculating 
the national income from this sector, the N.l.U. applied the 
income method and obtained information from various official 
and non-official services. In case of finance, use was made of 
the data published by banks, insurance companies, and co¬ 
operative societies and a sample analysis of the balance sheets 
and profit and loss accounts by the Reserve Bank. In case of 
income from insurance, at was computed as the amount of 
compensation paid to employees and operating surplus after 
deducting the amount of insurance premiums paid by various 
sectors of the economy. 



INDIiUY STATISTICS 14! 

Estimate of Imams from Homs Proptrfr. In this case, the 
N.I.U. divided property into two claim, rural and urban 
treat and made use of the records or the local bodies for the 
annual value of house property. The number of houses was 
taken from the census figures and a particular use was made 
of the figures of average rent as shown by 31 municipalities. 
In finding out the nett annual value from house property* 
allowance has been made for repairs, depreciation etc. on the 
basis of sample returns from the municipalities. In case of 
rural housing, data collected by N.S.S. has been made use of. 

Estimate of Government Professions and DomssUe Strokes. The 
N.I.U. divided government sources of income into two daises, 
administration and business enterprises. In case of adtnkii* 
tration,they prepare a 'Current Account of Public Authorities* 
giving expenditure on the one side and revenue on the other* In 
case of business enterprises, they give the figures of e*pes»diture 
and revenue for railways and others separately and then the 
total. In estimating wages and salaries, they have taken 
it to be 33 3% of total expenditure in public works and 50% 
in other works. They have also made use of the budgetary 
estimates in estimating salaries and wages paid. In estimating 
the value of domestic services, the N.I.U. adopted the method 
of average earnings as was followed by Dr. Rao. 

Estimate of JV#tt Incomefrom Abroad . In this case the N.I.U. 
has made use of the balance of payments statistics compiled 
by tht Reserve Bank of India. 

The N.I.U. has placed the margin of error at 10% on the 
whole with as high a margin of error as 25% in forestry and 
fishery and 33*3% in small enterprises, commerce and transport, 
professions and liberal arts, domestic services and house 
property. 



The technique of accounting for the current operation of 
the economy of a society, which goes by the name of social 



742 Al¥ 1 lfTHO0HCTIOff TO STATISTICAL METHODS 

accounting, h of very recent origin, 1 and though il it bated on 
he double entry principle, accountants neither here nor abroad 
lave ihown any inclination to regard it as something akin to 
.heir profession. The idea of accounting for the society, in 
fact, originated in the field of applied economics and its origin 
and growth can be traced to developments in the character of 
National Income studies. Formerly the main purpose of 
National Income studies was to show how wealthy the commu¬ 
nity was and how evenly the wealth was distributed, especially 
in comparison with other communities and within the same 
community at different times. This purpose was served by a 
simple estimate of total national income of a community. In 
recent years there has been a change in the approach to 
national income studies. Now-a-days these studies arc intended 
to provide a clearer picture of the nation’s economic activity— 
a picture that may help us in understanding what takes 
place in the economy, how the entire economic system works 
and what are the relationships among the various sectors of the 
economy. 

The main cause of this change was the developments in 
monetary theory ar.d the emergence of modern theories of 
income determination and of general equilibrium, associated 
with the name of the late Lord Keynes. Keynes propounded 
a simple proposition that the level of employment and income 
depends upon the level of expenditures or the effective demand 
for goods and services. He also demonstrated that if the 
Government, by a policy of intervention, could increase the 
national expenditure to a size large enough, it could ensure 
employment to all seeking work. Other factors that gave an 

1 “i n* idea that me economic system could be depicted as a system of 
flow* of goods and money is at lean as old at the famous TMw# 
of the eighteenth century French economist, Qucsnay. The 
£c*n«ttiffe# was a graphic representation of the way in which the 
circulation of wealth takes place—how certain groups of people produce 
wealth, how thh a pasted on to other groups in the form of money 
incomes, and how die latter groups use this money income to buy the 
goods produced. The modem system of social accounting is essentially 
the same as it also trkt to give a graphic representation of the 
production and consumption of wealth in a community in the form of 
incomes and expenditures of ail the various groups U the community. 




INDIAN STATISTICS 


743 


impetus to this approach were the special circumstances that 
prevailed during and after the Second World War. A first 
hand knowledge of the working of the economic system and 
that of the net-work of relationships among the different parts 
of the economy was needed for complete mobilisation of econo¬ 
mic resources for the successful prosecution of the war. In the 
post-war period, the information was desired to throw light on 
problems of economic reconstruction and development, and lor 
assessing economic change as a background for economic 
decision-making in connection with public policy. 

For a proper appreciation of the structure of Social 
Accounts, it is necessary to have a dear idea about certain 
characteristics of the economic system. This system is based 
on division of labour involving specialisation in production 
and exchange of products and services through the medium 
of money. The position actually existing is that people offer 
money to specialised producers in exchange of products they 
want and these productive enterprises offer money in exchange 
of productive services. The people who buy goods are in fact 
the persons who receive money income. Thus the economic 
system can be pictured as the flows of money incomes and 
money expenditures. Expenditure on goods and services results 
in flow of money from the people to productive enterprises, 
and payments made by producers to various factors of produc¬ 
tion results in flows from productive enterprises to the people. 
The relationship between these flows of income and expendi¬ 
ture is fundamental to ihe whole study of Social Accounts. It 
can be easily demonstrated that the flow of income must be 
exactly the same as the flow of expenditure. The total tales 
receipts of productive enterprises from Anal consumers, in the 
ultimate analysis, will go in the form of wages, salaries, rents, 
interest and profits. This inter-relationship can be stated in a 
generalised form by saying that for any period, expenditure by 
purchasers, value of production and incomes produced are all 
exactly the same things looked at from different points of 
view. This means that production, expenditure and income 
must balance each other. Now, suppose that Una) purchasers 



744 AW INTRODUCTION TO STATISTICAL METHODS 


decide to reduce their spending. This would lead either to a 
reduction in production in which cate incomes too would be 
reduced, or, if production is maintained at the tame level the 
entire reduction in the sales receipts will be met out of profits. 
In the same way , if expenditure is increased either more wtU 
be produced without consequent increase in wages, profit* etc., 
or the same amounts will be sold at a higher price yielding 
higher profits. The process is the whole foundation of the 
system of social accounts. 

This working of the economic system is, however, compli¬ 
cated by the fact that the recipients of income do not spend all 
that they receive. A part goes to saving in one form or an¬ 
other. Saving is that part of a person's income which is not 
spent on purchases for current consumption. Now, if the 
incomes are to be kept intact this saving must be offset. 
This offset is created by expenditure on current production, or 
capital equipment. The income received by producers of 
capital equipment is financed not out of sales receipts from 
consumers but by the expenditure of the people who arc 
willing to finance the purchase of the capital equipment (in the 
form of investments). This means that, in addition to demand 
for consumption, there are other sources from which demand 
might arise, viz,, financing the production process, financing 
the stocking of goods and financing the capital requirements. 
Thus expenditure on consumption combines with expenditure 
on investment to determine the total sales receipts of productive 
enterprises. 

The foregoing description of the working of the economic 
system can be described by means of a simple diagram : 

Production by 1 f Incomes produced 

productive enterprises > yields and received by 

(to meet demand) j income recipient 

r r 

which is (a) Consump. Demand for Ccm- 

d»po*ed of « tion " whl< * ' iumplion l«rf 

in two way* {_ (b) Saving finance* J | nvetUBeat> 

Thi* diagram provide* a bauc pattern of (octal accoonts 



INDIAN STATISTICS 74$ 

which can be developed further to provide more useful infor¬ 
mation. Production is undertaken by different kinds of enter* 
prises and the motives which inspire them to determine their 
scale of output are also different. Similarly, there are various 
income recipients whose expenditure decisions arc not similar. 

In order, therefore, that Social Accounts may be able to 
provide useful information for the purpose of the formulation 
of economic policy, it is necessary that the entire economy is 
demarcated into various sectors and the transactions of each 
sector are classified into various economic functions. This is 
done by recognising separate sectors of the economy and by 
keeping more than one account of each sector into which are 
entered transactions relating to one particular function or 
form Of activity. 

The common practice with regard to the system of social 
accounts is to divide the national economy into three lectors, 
via. 

(i) Business enterprises or ‘Firms’, 

(ii) 'Households' and private non-profit institutions, and 

(iii) General Government. 

Business enterprises arc labelled as 'firms’ and include all 
organisations and institutions which produce goods and 
services for sale to the general public at a price intended to 
cover the cost of production. These enterprises may be owned 
either by corporations or government or private individuals. 
Thus the activities of a person operating on his own account, 
for example, a doctor or a lawyer, would, so far as his business 
is concerned, be classified under 'firms’. The accounts of the 
firms would reflect the productive activity in the community. 

'Households* are all individuals or groups of persons, who 
are normal residents and who receive payments for services 
rendered by them to the 'firms*, such as wage earners, salary 
earners, property owners, businessmen etc. They also include 
private non-profit institutions which are not established primarily 
with the aim of earning a profit and their product or service is 
made available to the people free of any cost. This sector reflects 
broadly the consumption activity in the community. 



746 AW IWTBODUCTIOW TO STATISTICAL METHODS 

General Government comprises government agencies which 
undertake such forms of activity aa administration, education, 
defence, health services etc. Such agencies may be owned by 
the Central or the Local Government. The main test that is 
to be applied to an agency for its inclusion in the General 
Government is that its produce is not marketed and is made 
available to the people free of any charges. The government 
sector may be regarded as a special section of the consumption 
activities of the economy in which purchases of goods and 
services are made on behalf of the community as a whole, and 
these purchases are made out of taxes collected from the people. 
It if, therefore, like a ‘household* but of a special kind. 

Thus the first stage in the construction of Social Accounts 
is the division of the economy into the three sectors and the 
classification of the economic activity of the nation into these 
broad groups. 

The second stage is to distinguish the different forms of 
economic activity within each sector. This is a functional 
distinction and accordingly an economic activity may be viewed 
in either of the three basic forms, viz., production, consumption 
and saving or adding to wealth. Thus all transactions assigned 
to a particular sector would be classified, according to their 
functional distinctions, into one of the three accounts, one for 
each of the three basic forms of economic activity. The tran¬ 
sactions which a particular sector may have with any other 
sector of the economy or with the rest of the world will be 
recorded in a separate account called the external account. In 
this way there will be four types of accounts in each sector, viz., 

(i) The Production Account, 

(ii) The Consumption or Appropriation Account, 

(tii) The Savings or Resting Account, and 

(iv) The Externa) Account. 

(i) The Production Account. The Production Account of a 
sector shows the revenues and expenses connected with the 
productive activity. On the credit side it shows the sale 
proceeds and subsidies accruing to the sector and also an 
item showing the increase in the stock held by the sector. On 



INDIA* STATISTICS 74? 

(he debit tide it thows (he expenses of production consisting of 
materials purchased from outside the sector, indirect taxes, 
depreciation and other operational reserves. The remaining item 
on the debit side is the nett value added by the sector and 
represents (he amount of the gain accruing to (he sector as a 
result of its productivity This includes the compensation 
given to employees and operating surplus (Le,, rent, interest 
and profit). The gain of the sector is transferred to : 

(ii) The Consumption or Appropriation Account. The credit 
side of the account, in addition to gain from productive acli* 
vity, shows the gain of the given sector from investments and 
current transfers from other sectors, including the sector of 
other countries. On the debit side thin account shows the 
appropriation of this total, viz,, direct taxes, expenditure on 
final Consumption, transference of income of other sectors and 
the savings of this sector. The saving is transferred to ; 

(iii) The Savings or Resting Account. This account on its 
credit side also receives depreciation and other operating provi¬ 
sions of the sector and capital transfers and borrowing from other 
sectors and the rest of the world. On the debit side arc shown 
the capital formation of the sector together with the capital 
transfers and lendings toother sectors or to the rest of the world. 

fiv) The External Account. It contains as credits all the 
debit items of the other accounts of the sector which arc not 
also credit items of one of those accounts ; and as debits all the 
credit items of the other accounts of the sector which are riot 
also debit items of one of those accounts. 

Twelve accounts of the three sectors are combined in such a 
manner as to give the following six accounts. The process by 
which these six accounts are obtained is not an outright conso¬ 
lidation of the sector accounts. In certain cases rearrange¬ 
ment of the entries has been made with the object of making 
each of the six account relate to one of the familiar and impor¬ 
tant aggregates. 

1. The National Product and Expenditure Account. This is a 
consolidated production account of the Nation, it tabes the 
following form : 



748 a* iNTHODi/cnurr to statistical method# 

National Product and Expenditure Account 

National Income 
Depreciation and other 
operating provisions 
Indirect taxer 

Low : Subsidies 


Grots National Product 

at Market Prices, 

If from this gross national product we deduct depreciation 
and other operating provisions, we get National Product at 
market prices. Nett National Product at factor cost is repre¬ 
sented by the first item on the debit side. 

2. The National Income Account, and the following two 
accounts, via., the Consolidated Appropriation Account for general 
government and Consolidated Appropriation Account for households 
arc obtained by a rearrangement of the appropriation accounts 
of the different sectors. 

The National Income Account shows the manner in which the 
total income of the community is appropriated. It takes (he 
following form : 

National Income Account 


3. The Consolidated Appropriation Account of General Govern¬ 
ment shows income whether from property and entrepreneurship, 
taxation or otherwise currently accruing to General Govern* 
mem and the manner in which this total of current income is 


ationai income 


T^tonafTncome 


Compensation to employees. 
Income from property and entre¬ 
preneurship accruing to house holds. 
Savings of Corporations. 

Direct taxes on Corporations. 
Government income from property 
and entrepreneurship. 

Less : Interest on public debt. 


National 1 ncome 


Consumers expenditure. 
Government Current Expenditure. 
Gross domestic asset formation. 
Sales of goods and Services to the 
rest of the world and factor income 
payment from the rest of the world. 
Less ; Purchases of goods and ser¬ 
vices from the rest of the 
world and factor income pay* 

_ments to the restof the world. 

Gross National Expenditure at 
Market prices. 





IfIDUlf STATISTIC* 749 

allocated to current expenditure on good* and tervicet, current 
transfer! to hooseholds or to the rest of the world and savings. 
The following is its form: 

Consolidated Appropriation Account for Control Gommunt 

Expenditure of Government Government Income from 
on goods and services. > property and entrepreneur- 

subsidies. I ship. 

Inter*st on Pubik Debt. Indirect taxes. 

,TnmttT * *° h ° U * e ' Direct Taxes on household,, 
holds. ^ 

Current transfers to the rest } ® trtci r * XCf on Corporations, 

of the world. I Current transfers from the 

Savings of Government. rest of th e world. 

Government Exp. and Savings j'^TurfetuRevc^^ 

of General Government j Government. 

4 , The Consolidated Appropriation Account for Households on the 
credit side shows the income of the households gained from 
participation in economic activity together with transfers 
received from General Government and from the rest of the 
world. The debit side of this account shows the manner in 
which the income has been utilised. Its form is given below ; 


Consolidated Appropriation Account for Households 


Consumers* fexpcnditure 
Direct taxes on households. 
Current transfers to the rest 
of the world. 

Savings of Households. 


Expenditure and Saving of 
Households. 


Compensa tion of Employees 
Income from Property. 
Current transfers from Govern¬ 
ment. 

Current transfers from the 
rest of the world. 

I ncome of the Households. 


5. The Consolidated Capital Transactions Account. It shows 
on the credit side the savings of corporations, households, 
general government* provision for depreciation and other 
operational provisions, and nett capital transfers from the rest 
of the world. On the debit side it shows gross domestic asset 
formation and the nett increase in the foreign assets of the 
nation. By deducting from gross addition to National Wealth, 
depreciation and other operating provisions we arrive at nett 
addition to National Wealth. It takes the following form : 














750 AN INTRODUCTION TO STATISTICAL METHODS 


Consolidated Capital Transactions Account 



Depreciation and other ope¬ 
rating Provisions. 

Savings of Government. 
Saving* of Households. 
Saving* of Corporation*. 

Nett Capital transfers from 
the rest of the world, 


Gross addition to National Wealth 


6. The Consolidation Account for the Rest of the World represents 
a consolidation of the external accounts of the three sectors of 
the national economy and shows all transactions that take 
place between norma) residents and foreigners. In contrast to 
the other accounts, this account contains both current and 
capital transactions. It, therefore, also shows the nett capital 
grants the nation has received from abroad and is closed on 
the credit side by an entry for the nett increase in the foreign 
assets of the nation . 

Consolidated Account for the Rest of the World 

Purchases by rest of the world Sales ol goods and services to 
of goods and services from 1 the nation and factor in* 
the nation and factor income come payment from the 
payments to the nation. nation. 

Current transfers from out- Current transfers from 
side countries to Govern- Government, 

ment. 

Current transfers to house- Cut rent transfers from house¬ 
holds. holds. 

Nett Capital transfers to the Nett borrowing from the 
nation. nation. 

Totaf™ ~ ‘ TotaT" 


Din of Social Account* 

From ihe foregoing description of the structure of social 
accounts we can say that these accounts provide us with a 
factual background and consequently are helpful in all fields of 
economic decision-making. Their outstanding use has been in 
connection with the formulation of the nation** economic 
policy. 






INDIAN STATISTICS 1 $\ 

The formulation of the economic policy involves, broadly 
speaking, three stages. Firstly, it is necessary to assess the 
probable situation as it would be if the economic forces are 
allowed to move the way they are doing without any interfere 
ence from the government. Secondly, it is necessary to deter¬ 
mine the goal, Le., to decide the extent to which we would like 
the situation to be different from what it is expected to be. 
And thirdly, to formulate measures that are needed to bring 
the situation nearer to a desired goal. Both the assessment of 
the probable situation, and the formulation of the measures to 
attain the desired economic goal will be aided by information 
on the factors relevant to the situation and the manner in which 
they are related to one another. Social accounting provides 
us with this type of information. It gives, as has already been 
explained, a systematic presentation of the major economic 
flows in the framework of a comprehensive accounting system 
and facilitates the understanding of relationship among the 
flow's. 

Another advantage that such accounting of a nation's 
economy possesses is that “it channels the mind of the policy¬ 
maker away from the immediate pressures of the specific 
problems to a consideration of the whole economy, of the dose 
interrelation among its parts and of the links between the 
present and future.” 

Index Numbers 

Index Numbers are constructed in India for quite a wide 
range of economic subjects and their popularity is constantly 
increasing. Many official and non-official agencies compile 
and publish index numbers of agricultural production, indus¬ 
trial production, commodity prices, cost of living, etc. 

Index Number of Agricultural Production 

After the World War 11 a large variety of index numbers 
in India were started and one of these was the index 
of agricultural production. Such an index gives an idea 



752 A» wtbodcctio* to mnmcu. NinaM 

•bout the tread of food tupplfct in the country. The following 
•eric* ere available for the index of agricultural production in 
India : 

(I ) Ministry of Food a nd Agriculture Series* 

(2) Reserve Bank of India Index* 

(3) Eastern Economist Index* and 

(4) Food and Agricultural Organisation Index, 

Mladatry of Food and Agriculture Series 

The Index Numbers of Agricultural production are 
compiled and issued by the Directorate of Economics and 
Statistics* Ministry of Food and Agriculture. The first series 
was issued through the January 1949 issue of the Directorate's 
monthly journal 'Agricultural Situation in India’. The series 
related to the years 1939*40 to 1945-46. The base period for 

this index was the average of 1934-35 to 1938*39. The index 
was based on 19 important agricultural commodities* arranged 
in two major groups as follows: 

Groups 

(1) Foodgraim Cereals: Rice* jowar, bajra, 

Maiz, ragi, barky 
and gram. 

(2) Non-Food crops Oil seeds : Sesamum, ground¬ 

nut, rape and 
mustard* linseed 
and castor-seed. 
Fibres Cotton and jute. 

Beverages : Tea and coffee 

Miscellaneous ; Sugarcane and 

tobacco. 

The index was obtained as the weighted geometric mean of 
the production relatives calculated with reference to the bate 
period. The weights were taken to be proportional to the 
value of the commodity produced in the base year. lit 
evaluating the value of production the harvest prices were 



ITOIAW STATISTICS 758 

used but where these were not available wholesale prices were 
used. The combination of the two group indices (Food group 

and Nonfood group) gave the general index and the weights 

assigned to food group and non-food group were in propor¬ 
tion of 2 : 1. 

These series were uiscontinued from the year 1949 in view 
of the progress made in the compilation coverage and technique 
of crop estimates and a fresh series were started from July 1954 
with the agricultural year 1949-50 as base. 

Revised Series. The revised series of the Index of Agricultural 
Production covering 28 principal crops with / < 7 * 9-50 as the base 
are being issued by the Directorate of Economics and statistics 
Ministry of Food and Agriculture. The 28 principal crops 
covered by the index are divided into two main groups and 
seven sub-groups as follows : 

(A) Foodgrains : 

(1) Kharif cereals—Rice, jowar, bajra, maize, ragi and 
small millets. 

(2) Rftbi cereals—Wheat and barley. 

(3) Pulses—Gram, tur and other pulses. 

(B) Non-food crops ; 

(1) Oilseeds—Groundnut, sesamum, rapeserd and mustard, 
linseed ami castor-seed. 

(2) Fibres—Cotton, jute and mesta. 

(3) Plantation crops—Tea, coffee, and rubber. 

(4) Miscellaneous crops—Sugarcane, peppers potato, 
tobacco, ginger and chillies. 

tyrifhting. Weights have been assigned to the different 
crops in* proportion to the total value of production of each 
crop during the base period. The evaluation of production 
has been done at the annual harvest prices prevailing during 
the year, and the year in which reliable harvest prices were not 
available, wholesale prices have been used after making appro* 
priftte allowances. The following table shows the classification 
of commodities and the weights assigned to the various sub¬ 
groups : 



754 AIT IWTHODUCTHW TO STATISTICAL MTOWM 


8 


«4 w : i 

* 

¥ 

■<#* 

■ , 

* 

9 

*> ; 

1*1 

S5" 

s 

& 

ft 

35- 

. .S. 

ft 





**■* 


mm- 


46?. 

63 

& 

r*> 

ft 

<o 

ft 

m 

ft 

N 

ft 

In 

mm 

e* 

ft 


■mm 



—* 




• 

*« 

r 

C6 

to 

00 

CM 


|s 

1^ 

CO 

& 

mm 

s 

ft 

ft 

mm 

ft 

mm 

Mh 

A 

1 9 

© 


'■*-*■. 

00 


N 



8 

£ 


•A 

*n 

. r» 
M* . 

ft 

i 

1 

mm 

•Mi 

“ 

MM 



09 

-6561 

1 V 

co 

to 

«o 

00 

© 

« 

IT 

ih 

CM 

ft 

5? 

MM 

2 

*h 

to 

ft 

MM 



00 

00 

eo 

-M 

Mh 

«n 



ft 

•O 

r»* 

r^. 

£J 

5? 

ft 

n 

CO 

«w 

*M» 


Mi 


iMO 



N 

| *» 

o 

M* 

r-. 

•O 

*r 

9 

S?S 

8 

© 

MM 

j 

ft 

A. 

2 

ft 

in. 

^rs. 

OS 

tO 


© 

C4 

•n 

to 

ft 

g 

ft 

55 

CM 

ft 

«o 

CM 



MM 

•M 

MM 


•— 

MMl 

tA 

1 ftr> 

IP 

r* 

w 

— 

CD 

00 

s* 

! = 

s 

ft 

2 

g 

© 

© 

mM 


i 

—* 

** 

MM 

— 

-M. 

t* 

I 

! ^ 

*n 

to 

o 

L'f 

o> 

© 


ft 

ft 

ft 

mm 

s 

© 

«h 

© 

ib 

© 

j 

Oi 


»rt 

« 

MM 

MM 

9 

# 

ft 

ds 


<n 

ih 

8 

8 




jo m *» 

S C 

II II 

H t*» fr* H 


: Rtstrt* Bank of Mia Butitiin, November 1964* 







mDUTI STATISTICS , , -■ , 799 

' ■‘Mm: The iodiccs for 1960*61 to 196243 are generally 
baaed on partially revised estimates while those for 196344 
are generally baaed on final estimates. The indices for these 
years are, therefore* subject to revision. 

Tuhmqiu qf Construction, There has been a change in the 
method of construction. The Index numbers are constructed 
on a chain basts in the first instance. For each crop the all* 
India production for a year ii expressed a* a production 
relative with the production of the previous year as the bate. 
Subsequently these production relatives are linked to the base 
year production to yield the production index for that crop. 
The sub-group, the group and the all commodity index num¬ 
bers arc arrived at by finding the weighted arithmetic average 
of these crop production indices. 

The Reserve Ranh of India Index of Agricultural prodwc* 

tisa. 

This series of Index numbers compiled annually by the 
Reserve Bank of India is published in the December issue of 
its monthly bulletin entitled ‘Reserve Bank of India Bulletin*. 
The base for this series was formerly the average of production 
in 1936*37 to 1958-39. The indices are available with thh 
base for undivided India from 1939*40 to 1946-47 and for the 
Indian union after partition from 1947*48 to 1948-49. In 
1949-50 the base year for the ser ies was changed to the year 
1948-49 and since then the series are available for the India* 
Union on this base. 

Tire Index Number i$ based On 17 commodities distribu¬ 
ted over 5 major groups namely foodgrains, oilseeds, beverages, 
fibres and others. Separate indices are available for each 
group and for all the seventeen commodities as a whole. Tlte 
index is weighted, tlteweights being equal to the value of the 
crop pwluction. The various items and their weighting is as 
follows: 



756 Alt INTRODUCTION TO STATISTICAL METHODS 


Group Commodity Weights 

(!) Foodgrains Rice 38 

J owar am) Bajra 12 

laise 2 

R»gi 2 

Wheat 14 

Barley 4 

Gram 7 

Total 79 

(2) Beverages Tea 40 

Goffer 0*4 

Total 4‘4 

{3} Oilseeds Scsmnum 1 

Groundnut 7 

Rape and Mustard 2 

Linseed 1 

Cauor-seed 0*3 

Total 11*3 

(4) Fibres Cotton 3 

Jute 2 

Total ~~ ~ 

(3) Other Rubber *1 

Grand Total 99 8 


The Eastern Economist Index. ‘Eastern Economist’, winch is 
a weekly non official journal published from Delhi, for the first 
time brought out a series oflndex of Agricultural production 
in its special budget number of 1932-53. The base year is 
1938*37 and 1938*39 and the series is available from 1939*40, 
JI'hs series ts romoosed of 14 commodities which are classified 
into 4 .major groups as follows : 





I f<01 An STATISTICS 


757 


Foodgrain : Rice, wheat, millets, gram 

Fibres . . : Cotton, jute 

Oilseeds : Sesamum, groundnut, rape and mustard, 

linseed. 

Miscellaneous : Sugarcane, tobacco, tea, coffee. 

This index is also weighted and the weights arc assigned 
in proportion to the values of commodities produced during the 
base period. 

The F. A. 0. Index. The index numbers of agricultural pro¬ 
duction of various countries including India are compiled 
by the Food and Agricultural Organisation of the United 
Nations and published in the F. A, O. Year Book. The 
F, A. O. indices are compiled with the object of making inter* 
country comparisons of agricultural production. The base 
period for F. A, O. index is 1934-38. The index number 
relates to gross production of a large number of commodities 
common to most of the countries. These commodities are 
divided into 11 groups for which scoaratc figures ofiridex 
numbers are available. 

The weighting system is vrry complicated. Weights are 
based on Wheat Relative Prices. The price of each commodity 
it calculated in terms of gold francs per metric ton and then 
converted to wheat relative price—the price of wheat being 
100 gold francs per metric ton in 1934-38, Weights are 
assigned on the basis of such commodity prices expressed in 
terms of wheat. 

fades Numbers of Commodity Prices 

Index numbers are available in India for harvest prices, 
wholesale prices and retail prices. 

Index numbers of harvest prices of chief crops arc compiled 
by the Directorate of Economics and Statistics, Ministry of Food 
and Agriculture, and are published for the period of 1939*40 
to 1949*50 in the Indian Agricultural Price Statistic;*, 1950-51. 

Index numbers of wholesale prices are compiled bv the 
office of the Economic Adviser, Ministry of Commerce and 



758 AN INTRODUCTION TO STATISTICAL METHODS 

Industry and are published in the Indtx Number of Wholesale 
Prices in Mia, a weekly bulletin issued by the Economic Adviser. 
The Department of Commercial Intelligence and Statistics, 
Ministry of Commerce and Industry, also compiles and pub* 
lishes regularly in the Indian Trad* form* 1 the index numbers 
of wholesale prices in Calcutta. 

Official Index Number of Wholesale Prices in Indies is published 
by the office of the Economic Adviser to the Government of 
India, Ministry of Commerce and Industry. 

In 1539, the office of the Economic Adviser to the Govern¬ 
ment of India introduced a series of Weekly Index Number of 
wholesale prices in India with week ending 19th August, 1939 
as base.* The weekly index was of the sensitive type and was 
worked Out as the simple geometric average of the price 
changes of 23 selected commodities. This series was, however, 
discontinued in 1948. And since 1947 a General Purpose 
Index Number of wholesale prices was prepared. 

This index is based on the prices of 78 commodities w hich 
are arranged in 18 sub-groups and 5 important economic groups. 

The base period for the index is the year ended August 1939. 

The weekly index is calculated for one-day-a-week prices on 
or about Friday. In order to secure a representative character 
for the index, particularly from the point of view of markets 
included, several varieties have been included in the case of 
many commodities. In all a total of about 230 quotations are 
taken into account in the compilation. For the most part 
prices are those charged by manufacturers or importers or 
those prevailing in main markets. 

The average used is the weighted Geometric Mean of the 
price relatives. 

The weight* assigned to the various commodities in the 
index are proportionate to the total values of the commodities 
gt determined from the estimated quantities marked and the 
prices prevailing during 1938*39. Thus in the case of agricul¬ 
tural commodities and other industrial raw materials the 

1 Tbk hat now been tuspended. 




mouw STATISTICS 759 

estimated quantities retained oy the producers have been left 
outofaccount. In regard to manufactured and sexni-manu* 
factored articles, it has been assumed that the entire production 
was put on the market. 

Official Index Number of Wholesale Trices (Bev. Series} 

Owing to the partition of the country and subsequent deve¬ 
lopments, the system of weighting of the official index of 
wholesale prices of 78 commodities became inappropriate as it 
was based on the data of the entire sub-continent. Further the 
base period, vie., the year ended August 1939, of the existing 
series is not quite suitable now as a lot of changes both in output 
and consumption have taken place since the outbreak of World 
War 11 and more specially after the launching of Five Year 
Plans. Moreover, during the recent years, the availability of 
price data has considerably increased- Consequently, it vas 
decided to work out a revised series with wider coverage and 
an appropriate base. 

Coverage of tk« Redsed Series 

During recent years the country has made very rapid prog¬ 
ress in many fields including the field of price statistics. Conse¬ 
quently, it became possible to raise the number of commodities 
from 78 to 112 and the number of quotations from 215 to 555 
in the revised series as compared to the old series. 

The new commodities included in the revised series are : 
barley, maize, ragi, potatoes, onions, oranges, bananas, milk, 
ghee, fish, eggs, meat, sugarcane, hemp, tanning materials, 
lubricating Oils, aviation spirit, diesel oil, electricity, bamboos, 
aluminium, tin, lead, German silver, handloom cloth, hosiery 
goods, coal tar products, medicines, tools, bobbins, leather 
belting, cycles, pottery goods and lime. 

The enlarged list of quotations has improved commodity- 
wae distribution of quotations, and the inclusion of new markets 
has bettered the statewise distribution The following table 
Ahbws gfOOpwtsc distribution of commodities, markets and 



760 apt introduction to statistical methods 


Group 


No. of No. of No. of 
Commodities Markets Quotations 


1. 

Food 

31 

105 

216 

2. 

Liquor and Tobacco 

5 

5 

10 

3. 

Fuel, Power, Light and 
Lubricants 

8 

7 

24 

4. 

Industrial Raw Materials 

23 

37 

84 

5 

Intermediate Products 

14 

7 

44 

6. 

Finished Products 

33 

22 

177 


Total 

112 

183 

555 


The choice of specifications and markets for commodities is 
based on an examination of each commodity regarding its 
place in the national economy and the representative character 
of the markets and varieties 

The Thapar Committee's (Agricultural Prices Enquiry 
Committee) recommendation regarding a list of &9 markets 
for cereals for inclusion in the index has been incorporated in 
this series. For other commodities the selection of markets 
and specifications has been finalised in consultation with 
Chamber* of Commerce, leading manufacturers, etc., besides 
the Central and State Governments. 

Sowers of Information The price data are collected through 
official as well as non-official agencies. The official sources are : 
Agricultural Marketing Department, Registrar of Co-operative 
Societies, Economics and Statistics Department, District and 
Sub-Divisional Officers, Forest Officers and other primary 
agencies belonging to the State Government Directorate of 
Economics and Statistics, Collectors of Custom, Central Com¬ 
modity Committees, State Bank of India, etc, The non-official 
agencies are Chambers of Commerce, Trade Associations and 
leading business houses. The table given on page 762 shows 
source wise distribution of the quotations. 

Out of the total of 555 quotations included in the revised 
series, 295 quotations are received from official sources and 
the reaaaioing’260 are obtained through non-official agencies. 




tnmktf statistics 761 

The majority of the officially obtained quotations deal with 
agricultural commodities and that non-agriculiural commo¬ 
dities. Of the non-officially obtained quotations of 260, U0 
are collected by mail or by the office staff directly from the 
manufacturers or dealers of the respective commodities. The 
primary agencies of most of the officially collected data arc 
the various State Government Departments and, although 
majority of the States collect wholesale prices of agricultural 
commodities on a fairly comprehensive scale, they hardly 
collect price data of manufactured and non-agricuhura! com¬ 
modities. Consequently most of the price quotations for manu¬ 
factured and non-agricultural commodities are obtained 
through non-official sources. 

Qpalth of Data. Various thetas at e applied for ensuring the 
accuracy of the price data. Every effort is made to trace the 
abnormal variations or inconsistencies through a careful scrutiny 
and such discrepancies arc immediately brought to the notice 
of the reporting authorities for proper explanation and rcconct* 
liation. The office of the Economic Adviser also gets price data 
for a large number of items from the Commercial and Price 
Intelligence Branches of the C. G. I. & E’s offices at the ports 
of Calcutta, Bombay and Madras for checking purposes. 
Weekly prices are also obtained from other sources which are 
not used in the compilation of the indices. Such price data are 
utilised for checking the accuracy of the prices (obtained 
through varied sources) included in the list of the index series. 
Yet another check applied is the comparison of the prices and 
the trends of prices of tb<* «me commodity prevailing in diffe¬ 
rent markets. 

Choke of Base. The chief considerations for the selection of 
the base year for the revised index of the wholesale prices have 
been (i) a post-war and post-partition year of narrow fluctua¬ 
tions in prices, and (ii) as close as possible to the commencement 
of the First Five-Year Plan. A study of the price-change* during 
the period 1947-53 revealed that there were wide fluctuations in 
the price structure. Prices rose steeply during January-July, 



762 AK INTMMWCriOfr TO STATISTICAL UTBOM 



iRlftAKatATim ?63 

1348* following the decontrol in December 1947, with a further 
stimulating effect on price* due to the devaluation of the rupee 
in September* 1949, and reaching the peak level in April, 1951 
as a result of the outbreak of the Korean War. Thereafter the 
prices showed an almost uninterrupted declining trend till 
March 1952 when the prices descended heavily. Thus the 
two years, via , the year ending August 1949 and the fiscal 
year 1952*55 were the only intervals of minimum price fluenia* 
lions during the entire post-war period. The Standing Com* 
mittce of Departmental Statisticians set up a Working Party in 
1952 to recommend a common base period for all official 
index numbers. The Work ing Party found the year 1952-53 of 
more or less all round stability and consequently suggested to 
bring ill official index numbers to the common base of 1952*53 
as far as possible. The year ending August 1949 was not 
found suitable for base period as the price data of many of the 
centres (Thapar Committee recommended a list of 99 centres 
for cereals lor inclusion in the revised series) recommended by 
the Agricultural Prices Enquiry Committee were not available 
and consequently it was found expedient to take the fiscal year 
1952-53 as the comparison base for the revised series. 

Commodity Classifications. The revised series follows the ‘Com¬ 
modity Classification’ as recommended in the Standard Inter¬ 
national Trade Classification with slight modifications to fit in 
with the situation of the country. With the exception of two 
groups, via., ‘Food Articles* and ‘Industrial Raw Materials', all 
the groups are either new or modified in the revised series as 
compared to the old series. The new groups are of ‘Liquor 
and Tobacco’, ‘Fuel, Power, Light and Lubricants*. The 
manufactures’ group of the old series has been broken up into 
two separate groups of‘Intermediate Products’ and ‘Finished 
Products’ in the new series. The ‘Miscellaneous* group under 
the old series has been apportioned into the other groups iden¬ 
tifiable in terms of commodities. 

The following is the set of new groupings: 

h Food Articles, 



764 aw lffiKODtcnopr to statistical method* 

2. Liquor and Tobacco, 

3. Fuel, Power, Light and Lubricants, 

4. Industrial Raw Materials, 

5. Intermediate Products, and 

6. Finished Products. 

The above group of ‘Food Articles’ includes, beside* item, 
in the old group, edible oils shown under ‘Semi-manufactures' 
in the old series, Vanaspati, cashewnuti and spices included 
under ‘Miscellaneous 1 group in the old series, and the new 
items barley, maize, ragi, potatoes, onions, oranges, bananas, 
milk, ghee, fish, eggs and meat. 

The group of ‘Intermediate Products 1 is comparable with 
the ‘Semi-manufactures 1 group of the old series except for the 
difference that mineral oils, vegetable oils (excluding linseed 
oil), timber and oilcakes have now been transferred to other 
groups of the new index. 

A comprehensive list of 112 commodities divided into 6 
groups and sub-divided into 26 sub-groups, along with the 
weight assigned to each, i# as follows : 


Group Sub-Group 

Commodity 

Weight 

J. Food Articles 




1,000 

(a> 

Cereals 



382 



(«) 

Rice 

224 



(HJ 

Wheat 

106 



(Hi) 

Jowar 

19 



(>v) 

Bajra 

10 



(v) 

Barley 

10 



(vi) 

Maixe 

9 



(vii) 

Ragi 

4 

(b) 

Pulses 



84 



(i) 

Grain 

m 



00 

Other Pubes 

54 

to 

Fruits and 




Vegetables 


43 



0) 

Potatoes 

9 



C») 

Onions 

3 



fi*i) 

Oranges 

S 



f»v) 

Bananas 

24 



(v) 

Cashew nuts 

3 





INDIAN STATISTICS 


765 


imp Sub-Group Commodity 

Weight 

Food (d) 

Ghee and 


Articles (Gmtd.) 

Milk 

167 

(i) Milk 

93 


(ii) Ghee 

74 

ft) 

Edible Oils 

93 

(i) Groundnut 

38 


(ii) Gingelly 

9 


(iii) Mustard 

(iv) Coconut 

26 


9 


(v) Vanaspati 

11 

{0 

Fi»h, Eggs 



and Meat 

34 


(i) Fish 

10 


(ii) Eggs 

8 


(iii) Meat 

16 

U?) 

Sugar and 



Gur 

95 


(i) Sugar 

35 


(ii) Gur 

60 

(to 

Others 

too 

fi) Tea 

3« 


(ii) Codec 

3 


(iii) Spices 

43 


(iv) Retelmits 

11 


(v) Sail 

5 

Liquor ami 

Tobacco 


1,000 

(•} 

Liquors Liquors 

59 

(h) 

T obacco and 

Tobacco 



Manufactures 

941 


fi) Tobacco, Raw 

700 


(ii) Tobacco, Manufac 

* 


tured 

101 

Fuel, Power, 

Light and 

Lubricants 


1,000 

(») 

Coal Coal 

297 

(hj 

Mineral 



Oils 

479 


ft) Kerosene Oil 

86 


(ii) Petrol 

247 


(ill) Aviation Spirit 

24 


(iv) Diesel Oil 

58 


(v) Lubricating Otis 

64 




Alf IHTMODOCnOn TO statistical methods 


Group 


fat uni 


(Cutut). 

4. Industrial 
Raw Mate* 
rial* 


Articles 



Commodity 

Weight 

(e) 

Electri¬ 

city 

Electricity 

. 171 

(d) 

Castor 

Oil 

Castor OH 

53 

(») 

Fibres 


1,000 

393 

(i) 

Cotton 

204 


(«) 

Jute 

150 


Hi) 

Hemp 

14 


(iv) 

Wool 

13 


(v) 

Silk 

12 

(b) 

Oil Seeds 


308 

(«) 

Groundnut 

177 


(H) 

Linseed 

32 


(iii) 

Castor-seed 

10 


(iv) 

Gingdly teed 

31 


(v) 

Rape seed 

67 


(vi) 

Cotton Seed 

35 


(vii) 

Copra 

36 

(c) 

Minerals 

14 


0) 

Iron ore 

2 


(ii) 

Mica 

9 


(Hi) 

Manganese Ore 

3 

(<0 

Others 

205 


(i) 

Hides 

26 


(ii) 

Skins 

16 


(i») 

Tanning Materials 2 


(iv) 

Sugarcane 

65 


(v) 

Rubber 

5 


(vi) 

Lac 

16 


(vii) 

Logs and Timber 

72 


(viii) 

Bamboos 

5 

(») 

Intermediate 

Industrial 

Products 

(i) 

Linseed OH 

«.ooo 

Ml 

19 


(H) 

Leather 

25 


C«i) 

Cotton Yarn 

53 


(iv) 

Rayon Yarn 

13 


W 

Coir Yarn 

: 7 




;v-::^ . , HI 


Group 

. Sub-Group 

Commodity 

Weight 

$* Minofiic- 

(b) 

Metals 


26 

tuned Articles 


(i) 

Pig Iron 

3 

(CMtd.) 


00 

Semis 

3 



(«>) 

Aluminium 

S 



Or) 

Bran 

5 



(v) 

Zinc 

3 



W 

Copper 

6 



(vii) 

Tin 

1 



(viii) 

Load 

l 



(i*) 

German Silver 

1 

6. Finished 





Product* 




853 


<•) 

Textile* 


506 



(i) 

Cotton 





Manufactures 

313 




(i) Cloth (Mill) 

240 




(ii) Cloth 





(Handloom) 

70 




(iii) Hosiery 

3 



(») 

Jute Manufacturer 

126 



(iii) 

Woollen 





Manufactures 

12 



Or) 

Silk and Rayon 





Manufactures 

55 


(b) 

Metal 





Products 


41 



(i) 

Finished Steel 

38 



(«) 

Aluminium Utensils 3 


(c) 

Chemicals 


70 



(i) 

General Chemicals 

18 



•(«) 

Coaltar Products 

"i 



(iii) 

Dyeing Materials 

ii 



(hr) 

Paints 

• 8 



(v) 

Medicines 

IS 



(vi) 

Soap 

' : 19 



(vii) 

Fertilizers 




Oj)0glMS» : 

Oilcake* 

99 


768 AN INTRODUCTION TO STATISTICAL METHODS 


Group 

Sub-Group 

Commodity 

Weight 

6. Finished 

(e) Machinery and 



Products 

Transport 



(Ctmld.) 

Equipments 


106 


(i) 

Machinery 

66 


(ii) 

Tools 

2 


(in) 

Bobbins 

2 


(iv) 

Leather belting 

1 


(v) 

Vehicles 

19 


(vi) 

Cycles 

4 


(vii) 

Rubbr r Tyres 




and Tubes 

12 


(f) Other Products 

103 


(i) 

Rubber Shoes 

3 


(ii) 

Leather Shoes 

23 


(iii) 

Matches 

12 


(iv) 

Paper and Newsprint 21 


(v) 

Bricks and Tiles 

18 


(vi) 

Plywood tea chests 3 


(vii) 

Pottery Goods 

2 


(viii) 

Glass 

8 


(ix) 

Cement 

12 


(X) 

Lime 

1 

Wrigkting Sjfstfm, The weights assigned to various 

comituy* 

dities are leased 

on the estimates of marketed values of domestic 

produce and the values of imports inclusive of duty. 

Weights 


of manufactures have been determined on the data of the gross 
value of product* as obtained at the Third Census of Indian 
Manufactures 1948 (Imports have alto been taken into account). 
In the case of intermediate products, only the portion produced 
for sale has been considered. The weight of electricity is based 
on the energy sold by electricity enterprises ami valued at the 
average all-India rate, Petroleum data are based cm consump¬ 
tion figures. 

The weight base and the price comparison base are of 
different periods. The weights refer to the post-partition 




mDIAPt STATISTICS 7®9 

period 1948-49 nod the comparison base rests on the hscal year 
1952-53. The weight base is thus different from the price 
comparison base as anticipated by the 0, N. Statistical Commis¬ 
sion. The Working Party fset up by the Standing Committee 
of the Departmental Statisticians) also saw no reason as to 
why the weight base and the comparison base should necessarily 
be the same. 

According to new weighting for the groups the relative 
importance of the groups in the revised series has been changed 
in comparison to the old scries. A comparative study of the 
weights assigned to each group in the revised ami the old 
series can be made from the following table : 



Groups 

New 

Revised Series 

Old 

Series 

1 . 

Food Articles 

50 4 

31 0 

2. 

Liquor and Tobacco 

2 1 

— 

3. 

Fuel, Power, Light and 
Lubricants 

3*0 


4. 

Industrial Raw Materials 

15 5 

180 

5, 

Manufactured Articles* 

29 0 

47 0** 

6. 

Miscellaneous 

— 

4*0 


Total 

1000 

100*0 


There has been a substantial change in the weights of 
the Food Articles and Manufactured Articles. In the new 
series, the weight for the food group has risen from 310% 
to 50*4% while that of manufactured articles has decreased 
from 47% to 22%. In the new series the weight of non-food 
groups is only 49 6% as against 69%, m the old series. This 
shift ing of weights from non -ford was mainly due to the addi¬ 
tion of a large number of new commodities under food and the 
transfer of edible oils, vanaspati, spices and cashew nuts from 
the non-food groups in the existing series to the food group in 
tire revised series. 

* In the new series, menu tar lured articles comprise of 
(i) Intermediate Industrial Preduett, 

(2| Finished Product*. 

M Semi-Manufactures 17 0 and Manufactures 30 0. 

49 




776 4jr uiTBODocnoif to nAtmtcja* methods 


Geometric memo hat been relegated in the background. 
In the new series* the weighted arithmetic mean has been 
preferred as against the geometric average used in the old 
series. 

Various steps in calculation leading finally to the group 
index are: 

(1) The collection of the weekly quotations for the pres¬ 
cribed varieties prevailing on about Friday ; 

(2) The calculation of the price relatives as the percentage 
ratios which current price quotations bear to those 
prevailing in the base period ; 

(S) The computation of the commodity index based on 
the simple arithmetic mean of the price relatives of 
varieties ; 

(4) Derivation of the sub-group or group index as weighted 
arithmetic average of commodity indices ; and finally* 

(5) The compilation of the all commodities index as 
weighted arithmetic average of the group indices. 

In cases of missing quotations, the general trends obtaining 
in the other varieties are used in estimating the commodity 
index. If a particular quotation is not at all expected to be 
obtained in the future, another representative and comparable 
variety is introduced for replacement* the missing commodities 
are also given the same treatment as the missing varieties. 
The index of the missing variety is imputed to the new variety 
for the latest common week for which quotations for both 
varieties sure available, 

hrsbltss of Csatiaalty 

From the view point of comparative analysis, the prob* 
lent of contim ity invariably misers whenever a new series 
of index numbers is to be compiled. The out-dated and the 
new series may be linked by suitable conversion factors for the 
bast period in respect of individual commodity items and the 



INDIAN STATISTICS 




general index* The general indices of the old and the reviled 
aeries can be linked on the basis: 100 of the new scries** $#86 
(being the average for 1952-93) of the old series, So far at 
group comparison is concerned, some limited comparability 
continues even though it is not possible to establish complete 
one-to-one correspondence between the new and the old groups. 

Consumer Price Index Number** 

Concept and Scope. Consumer price index numbers are 
designed to measure by means of appropriate weighting the 
average change over time in the prices paid by the ultimate 
consumers for a specified quantity of goods and set vices, ft 
should, however, be dearly understood that the consumer price 
indices measure changes in the cost of living of workers due to 
changes in the retail prices only. The measurement of changes 
in the cost of living due to change in the living standards is not 
included in the usual concept of the consumer price index. 

In defining the scope of a consumer price index it h 
necessary to specify : 

(1) The population groups covered, r,g, working class, 
middle class, etc., and 

(2) The geographical areas covered, eg, whan areas, rural 
areas, a city, town, etc* 

Functions of Consumer Price Indict*. The main function of a 
consumer price index is to serve as a measure of change in the 
retail prices of a specified quantity of goods and services. But 
such indices are useful in many other ways. They help in wage 
negotiations and dearness allowance adjustments, etc* Govern¬ 
ments can make use of such indices in framing wage policy, 
price policy, rent control, taxation and general economic 
policies* Changes in the purchasing power of money and leal 
income can be measured and markets tor particular kind of 



•For detailed study consult 'Consumers Price Index Numbers' monograph 
fritted fey labour Bureau, Ministry of labour Government rtf India i,19W 
Consumer Trice Index was formerly railed ‘Cart of living Index'. The 
chum; in tlie aame was made in accordance with internal teewnmendarions 
and the grm ing practices in other countries. 



772 AN INTRODUCTION TO STATISTICAL METHODS 

goods and services can be analysed with the help of these 
indices. 

frscsatisas In Use ns# of a Consumer Price Index 

Before making use of a consumer price index it is necessary 
to inquire into the following ; 

(1) Scope of the Index . The class of the people and the area 
to which the consumer price index is related must be carefully 
determined. 

(2) Reliability of tkt Index Number should be carefully ascer¬ 
tained* The reliability of an index depends mainly upon the 
reliability of the price data used and sampling technique 
adopted The sample of the households covered in the course 
of a family budget inquiry should be representative of the 
population group. Similarly the items selected for pricing 
should be representative ol all the items in the average budget. 
The localities for which price data are collected should be 
representative of all localities from which the population group 
makes its purchases, and the retail outlet from which prices 
are collected should be representative of all retail outlets used 
by the population group. 

Problems Isa the Construction of Consumer Price Index 
Number 

The main problems in the construction of Consumer Price 
Index consist of; 

(i) The determination of weights, and 

(ii) The collection of retail prices. 

(i) Determination of Weights. In general, weights are deter¬ 
mined on the basis of the consumption pattern of the class of 
population to which the index relates. This means that the 
weights that are assigned to different commodities are related 
to their actual consumption on expenditure upon them. 
Statistical data relating to consumption or expenditure is 
derived from family budget inquiries. It is, therefore, necessary 
that such inquiries be property planned. Since a complete 



fHDIAN STATISTICS 773 

count of all the families included in the area ii not practicable! 
it it of vital importance that a sound sampling method be 
adopted. On the basis of the results of the family budget 
inquiries, an average budget of the expenditure on different 
items consumed by families of different sire and composition 
included in the study and of quantities of the different items 
consumed by them is derived. This average budgetwrepresen- 
t alive of the population group to whHi the consumer price index 
is to finally relate. It covers all gioups of consumption expen* 
diture—food, housing clothing and miscellaneous, It has been 
recommended by the Seventh International Conference of 
Labour Statisticians that for purposes of international com¬ 
parison the classification of consumption expenditure should be 
made in such a way as to make possible a grouping or regroup¬ 
ing of items in the following groups and sub-groups : 

(a) a group of food, including, as separate items, food 
consumed away from home and alcoholic beverages ; 

(b) a group of housing, including, as separate items, rent* 
fuel and light,and household furnishings and appliances ; 

(c) a group of clothing ; and 

(d) a group of miscellaneous, including, as separate items, 
the following ten sub-groups—medical care, personal 
care, insurance and other contr ibutions, education and 
reading, postage, recreation, tobacco, gifts and charities. 

The conference also recommended that items of non-con¬ 
sumption outgo (income and similar taxes and interest on 
personal debts) should be separated from the items of consump* 
lion expenditure ; and thus items like taxes, interest on debts, 
purchase of savings certificates, insurance premiums etc,, should 
be excluded from the items of consumption expenditure. 

The various items that are included in each consumption 
group of the average budget are then assigned weights in 
proportion to their importance withir that group v on the bash 
of the figures of cither expenditure ot quantity of consumption 
in them wage budget. 

Though all important items of expenditure are included In 
the index, the sample of stems selected for pricing has to be 



774 AN INTRODUCTION TO BTATtmCAL METHODS 

limited because the larger the list of items, the greater the 
lime and labour involved in the collection of prices and the 
computation of the index; Moreover, no loss in accurate will 
result if out of a few items having similar price trends only one 
is selected for pricing and the weights of all such items k 
assigned to the priced item. Thus if from a study of price 
behaviour and other factors, it is established that a particular 
unpriced item has a price t rend similar to that of a priced 
item, the weight of the unpriced item is added to that of the 
priced item. 

On this principle, the weights of individual items and 
consumption groups, based on expenditure data, can be derived 
as follows : 

To the expenditure on each priced item is added the 
expenditure on unpriced items known to have similar price 
movements. The resultant expenditure on each item is express¬ 
ed as a percentage of the total expenditure accounted for by 
all the items included in a group to yield the weight of the item 
within the group. The weight of a consumption group is 
obtained by expressing the total expenditure on the group as 
a percentage of the total expenditure on all groups as recorded 
in the average budget. 

(it) CdUctUn of Retail Priuu The second main problem 
in the construction of the cost of living index number 
is the collection of retail prices of the items in the index. The 
prices are to be collected boil) for the base period and the 
current period 

Some of the principles that should be observed in the 
collection of retail prices are mentioned below : 

i. The work on the collection of retail prices should 
commence simultaneously with the conduct of family budget 
inquiry, because the retail prices are required both for the base 
period and the current period, and the base period has to 
synchronise more or less with the period of the family budget 
inquiry. Since the selection of the items to be priced can be 
finished only after the completion of the average budget, retail 



im>i4K «tAnancs 7?5 

prices, to begin with, are to be collected for • lot of hems on 
the basis of general knowledge of the consumption habits of 
the population group cover ed. This initial list should be 
sufficiently Urge. Unhnportant item can be dropped at a 
later stage. 

2. When the average budget it ready it would be posrihte 
to fix a list of items that are to be priced. For each Item that 
it priced, a standard of quality should be fixed by meant of 
suitable specifications. 

It may be pointed out here that since in practice neither 
market conditions nor consumer*’ preferences remain unchan¬ 
ged over a period of time, it become* difficult to conform to a 
fixed list and constant qualities and quantities of goods and 
services. Now the quantities consumed (which means the 
tame thing as weights assigned to different eesranoditiet) can¬ 
not be changed without conducting fresh family budget 
inquiries. The Sixth International Conference of Labour 
Statisticians recommended that the pattern of consumption 
should be examined and the weights adjusted, if necessary, at 
intervals of not more than ten years to c o r r es p ond with the 
changes in the consumption pattern. The conference also 
recommended the use of tmill sample studies of consumer pur- 
chasm in the intervals between the more comprehensive surveys 
for discovering significant changes in consumption pattern to 
indicate the need for revision in the weights. 

Changes in the quality of priced goods and services are 
more frequent and when a marked change in the quality of an 
item occurs, an appropriate adjustment has to be made to 
ensure that index takes into account only renl changes in prices. 
Such adjustment can be made in following way*: 

If prices are not available for old qualities oversperiod, 
the method of Unking may be adopted, he., the price of the old 
tyaa% may be estimated on the basis of the trends of the 
prices of the new quality. Quality difference* can be evaluated 
fat terms of prices in consultation with the traders and only 
that part of the difference between the quotations for oM and 



776 AN INTRODUCTION TO STATISTICAL METHODS 

new qualities which represents a real price movement may be 
taken in account. 

It is, however, recognised that the detection of certain chan* 
ges in the quality which are not sudden but take place gradu¬ 
ally is difficult. No allowance can, therefore, be made for such 
quality changes in computing consumer price indices. 

3. Another important requisite in the collection of retail 
prices is that they should be those actually charged to consu¬ 
mers for cash sales. Account should be taken of discounts 
(if any] given automatically to all customers, and sales tax, etc., 
payable by them. It is necessary to see that the retail outlets 
chosen for the collection of retail prices should be such as can 
yield an average price, representative of the price that is being 
paid by the population group to which the index relates. 

4. If during a period of rationing or price control exhorbi* 
lant prices are charged openly to the groups to which the index 
applies, such prices should be taken into consideration along 
with the controlled prices. 

3. Attention should also be paid to the methods of price 
collection and the price collection personnel. Prices are 
collected usually by special agents or mailed questionnaires. 
Where special agents are employed, it is essential to give them 
intensive training. They may be supplied with a manual of 
instructions and a manual of specifications of items to be priced. 
The collected prices should be checked by obtaining duplicate 
prices through different agent* or by making actual purchase of 
the goods priced 

Method of Compilation of Consumer Price Index 

After determining the weights and collecting the prices of 
the selected comm odities and services, the Consumer Price 
Index Number is compiled with the help of Laspeyre’s formula 
which is a weighted average of price relatives, in this formula 
the weights are based on the values of expenditure during the 
base year. 



INDIAN STATISTICS 


777 


The formula a: 

' x 100 

where PnU are the prices of the current period, and PJs the 
prices of the base period* The formula can be re-written as : 

*(£"-* p*.) 

. . —' x 100 

* f# 


In practice, average price of each item for the base period is 
calculated, known as the ‘base price* of a particular item. 
Again, for the current period an average price in the form ol 
a simple average of weekly quotations is taken. Taking the base 
price as 100, the ratio of price change for each item is expressed 
as a percentage and that is called the ‘price relative* of a parti¬ 
cular item. Where there are different varieties of a particular 
commodity price relative is calculated for each variety, and 
then a simple average of such price relatives is taken* The 
price relative of each item, thus arrived at, is multiplied by its 
corresponding weight and the sum of these products for all items 
is divided by the sum of the weights of the items, thus giving us 
the group index number* 

Each one of the group index number is then multiplied by 
its corresponding group weights, and sum of the products of 
different groups is divided by the sum of the group weights to 
give us the consumer price index number for that period* 

C&mumfT Ptki India Numbers in India. The construction and 
maintenance of consumer price index numbers in this country 
dates back to the period immediately following lire First World 
War. The sharp rjse in prices which took place towards and 
after the end of World War 1 focussed the attention of several 
provincial governments to the problem of rising cost of Irving 
and led to the conduct of socio-economic inquiries among 
working classes as a preliminary to the measurement of cost of 
living- Family budget inquiries were conducted in Bihar, 
Bombay, Sholapur and Ahmedahad. These inquiries were 
not quite scientific The recommendations of the Royal Com* 




778 AN WTNOMJCTtlW TO STATISTICAL HICTHOIHI 

mission on Labour, however, gave an impetus for the conduct 
of family budget inquiries, on more scientific tines amt for the 
construction of scries of cost of living index numbers. The 
Second World War again brought in its wake a sharp use in 
prices and the question of compensating employees through 
dearness allowances came to the forefront. The fUu court of 
inquiry, that was constituted to investigate the question of 
dearness allowance for railway employees, observed in its 
report (1941) that the first requisite of any satisfactory revision 
of allowances is the preparation of up*to*date cost of living 
index numbers for three distinct classes of areas—city, urban 
and rural—and accordingly recommended that the question of 
preparing and maintaining such figures be considered by the 
Government of India. The Government of India accepted the 
recommendation of the Rau Committee and took upon itself 
the responsibility for the compilation and maintenance of cost 
of living index numbers for important centres in this country. 
The cost of living index scheme was initiated by the Govern* 
tttent in 1941 and during the years 1945-46 family budget 
inquiries were conducted in 22 industrial centres During the 
period of t he Second World War family budget inquiries were 
also conducted by several provinces and princely states. On 
the basis of these inquiries several new series of working class 
cost of living index numbers were started after the Second 
World War, The passing of the Minimum Wages Act, 1948, 
which makes it necessary for the State Governments to maintain 
cost of living indices for employees in certain unorganised Indus* 
trim coming within the perview of the Act, gave further impetus 
to the compilation of working class cost of living index 
numbers. 

There are at present more than 100 scries of index number* 
available in India, most of these series relate to working class em¬ 
ployed in manufacturing industries while a few relate to middle 
class and a few others to particular groups of population as tea 
potation workers, rural population etc. 20 of these series are 
issued by the Labour Bureau and the remaining are constructed 
by concerned departments of the respective State Governments. 



779 

The Labour Bureau also publishes an interim series of All-India 
Working Class Consumer Price Index Number which is based 
on the indices of 2? individual centres. 

These indices are designed to measure the change in the 
retail prices of goods and services during the current period 
as compared with their respective prices in the base period. 
Of the 20 indices compiled by the Labour Bureau 15 are on 
a common base—1944, now shifted to 1949, and in the remain* 
mg five the base yean vary from 1949 to 1953. The base 
period of indices compiled by the State vary, though some of 
the States have in each case adopted a common base period 
for all centres for which they compile consumer price indices. 
These indices do not take into account changes in the living 
standards and arc strictly price-indices. Almost all these 
indices relate to working classes. A lew of these relate to 
plantation workers and most of the others to workers in or¬ 
ganised industries. The weights used are based on the results 
of family budget inquiries. Price data are obtained by personal 
visits to special agents to the selected retail shops situated in 
piediiminantly working class localities. The indices are 
weighted averages of price relatives on fixed base, the weights 
being proportional to the expenditure needed in the average 
budget derived from the family budget inquiry. These 
indices are generally compiled in two stages. First, group 
indices are computed and then are combined into the 
general index which is the weighted average of the group 
indices. All these indices are on a monthly basis though some 
are compiled even weekly. 

The family budget inquiry at each centre was separately 
designed and conducted. In each case a complete list of 
workers to be covered was prepared on the basis of the pay* 
rolls of factories. Where organised working class localities 
existed, a list of working class tenements was prepared. These 
lilts served as frames for the selection of samples- The samptmg 
fraction was decided on the basis of practical considerations 
and the required number of samples was drawn from the bets, 



780 an mmoPCCTiow to statiwicajl methods 

generally by systematic sampling. The units so sampled 
were then visited and detailed information was collected 
by the interview method* among other things under such 
heads as the composition of the family* income of the family, 
consumption in terms of quantities, expenditure on various 
goods and services, indebtedness, housing conditions, etc/ 
In the case of the rejection of the sampled unit due either 
to the non-availability of the worker at the address given or 
his not fulfilling any required condition, substitution was 
generally resorted to by taking the unit preceding or following 
the sampled unit in the frame. The informant was generally 
the head of the family and the; period of reference was gene¬ 
rally a week or a month preceding the interview. 

It has been stated earlier that the consumer price indices in 
India are prepared for individual centres. The Labour Bureau 
has recently published a scries of All-India Average Consumer 
Price Index Numbers for Working Class Interim series. In view 
of the wide differences in consumption habits and price trends 
in the various parts of this country, an All-India Consumer 
Price Index does not seem to possess any real significance. 
There are, however, certain problems of all-India nature for 
which an All-India Index is useful. For the construction of 
an index of an all-India nature the most satisfactory procedure 
would be to conduct fresh family budget inquiries on a well 
designed basis so as to lead to a number of series of indices 
representative of the country as a whole and to combine these 
series suitably into a unified index. Pending such a develop¬ 
ment, the Labour Bureau has tried to obtain an All-India 
Index by combining indices of some of the individual centres. 
Some of the series included in the All-India Index are compiled 
by the Labour Bureau itself and others by the various State 
Governments. 

The base period of the All-India series was 1944. This base 
has now been shifted to 1949 as a consequence of the liasc- 
shifting of the series of individual centres. This means that 
state series of indices included in the AlMndia Index, which 



INDIAN STATISTICS 


781 

were on different base periods, have been shifted to base 1949* 
This kind of adjustment of the base was required in the case of 
those series which are prepared by the Labour Bureau which 
were on base 1944* All these component series of Indices 
relate to industrial labour employed in factories only. The 
selection of the series for inclusion in the Alb India Index has 
been done in such a manner that they might represent different 
regions of the country. The All-India Index is a weighted 
average of the final indicts of ail the component series shifted, 
where necessary, to 1949 as base. The weights used are the 
figures of factory employment during 1949 at various centres 
and also in various states. 

This index is compiled in two stages. First, state-indices are 
compiled by taking a weighted average of the indices of the 
different centres selected from each state, The weights used 
are the factory employment during 1944 at each one of these 
centres. If only one series has been selected for a state then 
such scries serves as the state series. The second stage consists 
in combining the state series as obtained above by means of 
weighted average. The weights used for this purpose are total 
number employed in factories during 1944 in the various states. 
The weighted average of state indices represent All-India 
Average Working Class Consumer Price Index Number. In 
the above index, only those centres have been included Where 
consumer price index numbers are compiled. In the State of 
Maharashtra there are as many centres as five (Bombay city, 
Sholapur, Jalgaon, Akola. Nagpur}, while on the other hand 
Madras, Punjab and U,P. have one centre each (Madras City, 
Ludhiana and Kanpur respect i vel y). Th is differential represen¬ 
tation of various Slates is, to some extent, removed by the 
weighting system adopted on the basis of distribution of factory 
workers employed in the various States in 1949. In spile of 
these limitations, All-India Working Class Consumer Price 
Index Number is the only convenient tool for making adjust¬ 
ments in the wage level at the all-India level. 



782 API INTRODUCTION TO STATISTICAL METHOD® 

New Series of CooMmer Price lades Number rfUm 
trlilWmrlm 

It was recommended in the Second Five Year Plan that steps 
should be taken to institute family living surveys for the revision 
of the present scries of Consumer Price Indices. Accordingly 
fresh surveys were conducted at 50 important factory, mining 
and plantation centres of the country during September 1958 to 
August 1959- On the basis of the results of these surveys 
new series of consumer price index numbers Cor industrial 
workers are now compiled by the Labour Bureau. Calendar 
year 1980 is taken as base period for all these new series of 
index numbers. An enlarged list of six groups divided into 
several sub-groups has been taken into consideration in compil¬ 
ing the new series. The number of item* included is about 
100 in each case. Index numbers have been compiled on the 
basis of Laspeyre** formula as weighted averages of price rela¬ 
tives, the weights being the relative expenditure as revealed by 
the surveys. Prices are collected by the officials of Labour or 
Statistics Department of the State Governments through 
persona] visits to the markets every week. For some of the 
Ct'mmodilm like ten, cigarettes, toilet soap etc., prices are 
collected once in a month. Arrangements have also been made 
for periodical rent surveys. Special adjustments are made in 
case of highly seasonal sub-groups like fruits and vegetables etc. 
New series of Consumer Price Index Numbers are available 
from January 1%1 onwards 

Limitation* of Consumer Price Index Numbers 

Consumer Price Index Number, being a statistical measure, 
is constructed on the basis of certain assumptions and thus 
it is to be studied in the light of such assumptions. Some 
limitations of consumer price index numbers are as follows : 

L like other price index numbers, consumer price index 
number at best xs an average. It measures the effect of percen¬ 
tage change in the pr ices of a group of oomtnodlitks and services 
consumed by a given group of population in an area over a 



Ilt&UJN STATISTICS 783 

given period of time Within the population group, there are 
variation* among the various families in regard to income, site 
and consumption pattern, etc But the index, at beat, reflects 
anaverage consumption pattern, an average fixe of the family 
etc. 

2, The Index it m Estimate* In constructing a consumer price 
index number, methods of sampling ate applied, e.g., the eon* 
sumption pattern studied is of a sample of families out of the 
total families of the given population group. Similarly, only a 
selected list of commodities and services is included in the index. 
Moreover, only the popular varieties are taken into considera¬ 
tions for measuring the effect of price changes instead of all the 
varieties consumed by that group of population. Further, as 
regards collections of prices, prices are collected only from a few 
selected shops and that too on selected days to represent tlie 
price changes for the period (generally a month) as a whole. 
Since methods of sampling are used at every stage, consumer 
price index number is to be interpreted subject to statistical or 
sampling errors. Therefore, this index reflects the true value 
only within certain limits. 

3. Tht Mtx is a price indf.x and not a cast of living index. As 
said earlier, the consumer price index number measures the 
effect of one factor only and that is, price changes. Other 
factors which affect the cost of living, as changes in income, 
tastei consumption pattern (qualities and quantities of goods 
consumed) are not taken into consideration in compiling this 
index number Therefore, this index number does not measure 
the changes in the actual liviiur costs, but measures only the 
effect of price changes. 

It is necessary to revise a series of consumer price index 
number at intervals of 10 or 15 years to take into account the 
changed conditions without, however, changing the conditions 
in the haute period, e.g., if a variety of a particular commodity, 
selected for the index, is not available in the market and some 
other variety lias been introduced, the new variety U included 
xn the index after eliminating for the difference in quality 
Ixrtween the two varieties. 



784 aw immoDucnow to statistical methods 

4. Tfu indict* ftr two difftrtni ctniru ca*n$t b* compand to mdi- 
t*k th* nlatitu Uvtt 0 / firms* Consumer price index number* of 
two different centre* are not comparable. Firstly, the bate 
period of the two indices may be different . Even if the two 
indkea Have a common base, a true comparison of the relative 
level of price* of the two centre* i* not possible, Thu* if 
the consumer price index numbers for two different centre* 
having a common base and for the same class of population, 
«ay working class, show different figure*, we cannot say 
that one particular centre is costlier than the other centre. 
It only means that prices in one centre have risen faster 
than in the otheT centre since the common base period. (In 
order to measure the relative level of prices of two centres, a 
special type of index number is required, w hich makes allow¬ 
ance for the difference* in the size of the population, their levels 
of income and consumption pattern, etc.) 

Easily, it is not necessary that the trend reflected by whole¬ 
sale price index number should be similar to that revealed by 
the consumer price index numbers. One possible reason for 
this is the lag between the movement in wholesale prices and 
consumer prices. Moreover, consumer prices are less sensitive 
to market condition* than wholesale prices. Further, there is 
also a difference in the system of weighting in the two index 
numbers. As such different trends may be revealed by the two 
index numbers. But over a long period of time some degree of 
similarity in the trend is reflected by them for comparable 
groups, say, .Food Group Indices, etc. 



APPENDICES 




APPENDIX l 


LOGARITHMS 

Logarithm of a given number to a certain bale if the power 
to which that bale must be railed in order to obtain the given 
number. Thus the logarithm (or simply log) of 1024 to base 
4 is 5. This means that if 4 (base) is raised to the power 5 it 
will give us the required number. 

The system of logarithms that is commonly used is on the 
bate 10, so that when we speak of the logarithm of a number 
what we really mean is the power to which 10 mutt be raised 
in order to obtain that number Thus the logarithm of 100 is 2 
for 10* ™ 100, Logarithm of 1,000 is 3 for 10* *« 1000, and so on 


Nklur.l Number Th«to,.rt,ta 


1,000 10 * 3000 

100 10 * 2000 

10 J0 J 1*000 

1 10° 0‘0OO 

I I0” 1 —I or T 

*01 rcr* -2 or 2 


It can lie teen from the above that the log of 10 ts 1 and 
log of 100 is 2. Hence the log of any number in between 10 
and 100 would be in between 1 and 2, which means that the 
log for any number greater than 10 and truer than 100 will be 
I Hka fraction. Thus a logarithm may consist of two parti : 
(i) an integral part, and (ii) a decimal (Hurt. The 'integral 
part* it called the CkancUristu and the 'decimal part’ the 
MsnUm 

Finding thi Logarithm. Ti find the logarithm of a number 
we will have to End out the characteristic as a be the mantissa* 
Finding tk* Chartuierutk. To determine the characteristic 
of a given number the following rules are applied ; 





IV AW tWT*O0trCT*0W to STATISTICAL METHODS 


(«) The characteristic of any number greater than I » 
positive and is one less t han the number of digits to the 
kit of the decimal point. 

(ii) The characteristic of any number less than 1 is negative 
and is greater by one than the number of zeroes be* 
tween the decimal point and the first significant figure 
of the given number. 

Following the above rules characteristic of a few numbers 
are as below : 

Number Characteristic 


3,758 

375*8 

37*58 


3 

2 

1 


3*738 

0*3758 

0*03758 

0*003758 


0 

.J or 1 

~~ 2 or 2 
— 3 or 3 


The minus sign of the characteristic is written at the top of 


the figure and is usually designated as bar 1, bar 2, bar 3, and 


bar 4 etc. 


Finding th* Mantissa Mantissa of a given number can be 
found in the logarithmic table. The first step in the process 
of finding the mant issa is to obtain significant figures of the 
given number. To get the significant figures omit all zeroes 
before the first and after the last non-zero digit. Thus 375 k 
the significant figure of 37,500 or *00375 or *375. 

The mantissa of all numbers having the same significant 
figures is the same. Thus the mantissa of 8, 80, 800, 8,000 or 
008 is the same, 


The first two significant figures of the number are then found 
at the extreme left of the logarithmic table. Thus if we Have 
to find the mantissa of 375 we look for 37 at the extreme left of 
the logarithmic table and then move along the horizontal line 
to the number in the vertical column headed by the third figure 

5/we obtain the mantissa 5 1 740 

log 375*™2*5740 

If, however, the significant figure of a number comprises of 



LOGARITHMS 


V 


only one digit, two zeroes may be added to it* right (if ft 
contiftts of two digits* one zero may be added) so that the 
method of finding the mantissa explained above may be 
adopted. Thus the mantissa of 8, 80, or 800 is 9,031. 

If a number.comprises of four significant digit* the mantissa 
is found by using the mean difference columns at extreme right 
Thu* to find log of 3,758 we proceed as : 

Mantissa of log 375«*5740 
Mean difference for 8*« 9 

or log of 3,738** 3*5749 

Similarly log of 37*58 1*5749 

Mantissa is always a positive number. 

ANTI LOGARITHMS 

To find the natural number of a given logarithm we have 
to make use of the table of anti logarithms, The mantissa will 
enable us to obtain the different digits of the given number 
and the characteristic will determine the location of the decimal 
point. 

Illustration : 

Find the number whose log is 2 4725 
Solution ; 

From Antilog tables, entry for 472 •* 2.965 

Mean difference for 5 «* 3 

Entry for 4,725 m 2,968 

Hence the number whose log it 2*4725 296*8 

Similarly the number whose log is 2*4725 •» *02968 

Tfct Vw* *iUgmrkkm* 

(1) MuUipliruiian. If two or more number* are to be 
multiplied find the sum of their logarithm*. The number 
corresponding to this logarithm, obtained from the table of 
antilogaritbms is the product required. 

Multiply 32*87 by 0*00238. 



VI AN INTRODUCTION TO STATISTICAL METHODS 


Solution : 

tog 32 B1 «*’15168 

log 0*00238 « 3*3766 

Sum « 2*8934 »* log *07823 

Hence 32*87 x 0*00238 ®=0-07823 

(2) It* a number is to be divided by another, 

subtract the logarithm of the latter from the logarithm of the 
former, The difference so obtained is logarithm of the 
quotient—the number corresponding to which can be found 
from the table of antilog*. 

IV tn (ration : 

Multiply and divide 0-342 by 0 0902. 

Solution : 


log 0 342 
log 0 0902 
Sum 

Difference 

Hence 0*342 x 0*0902 
and 0*342 t 0*0902 
( 3 ) Power of a Number. 


-1-5340 
2*9552 

-2*4892-* log 0*03084 
«w0*5788 b=» log 3*791 
-*0*03084 
-3*791 

The square, cube or other power of 


a number is obtained by multiplying the logarithm of the num¬ 
ber by the exponent of the power, the product is the logarithm 
of the required value. 


Illustration : 


Calculate the value of (0 07) f . 

Solution : 

log (0 07) ** 2 8451 

log (0 07)* -3x2 8451 

-4*5353 
**log *000343 
( 07)* «* *000343 

(4) of « Number. If it is desired to extract any root of 

a given number, divide the logarithm of the number by the 
number which indicates the root, 

The division of a logarithm presents some difficulty when 



LOGAHITHMS VII 

the characteristic is negative and t* not exactly divisible by the 
given value of the root. (It should be remembered that 
mantissa it always positive). The best way to overcome this 
difficulty is to make the characteristic exactly divisible and add 
compensating figure to mantissa. 

Illustration ; 

Solve v'144 

Solution : 

log 144^*2*1584 
Dividing 2*1584 by 2 wc gel 
I 0792 »log 12 
/. VI44 -12 

Illustration : 

Solve VO 625 
Solution : 

•or v' 6"625 } (r7959)-i {2 +1 7959 
-! 1*8979 
■=*log 0 7905 
or VO : '625- 0-7905 




LOGARITHMS K 

LOGARITHMS 



Ml |4 Ml Ml 


#Oj ! }mo 
fil hm 
22 l tfU 
t* Ijfcl 7 

*«f J979 

JJ L<'5» 

17 'j4JU 
«f £4471 

** | 46 S 4 

Mg < 77 * 

*1 *49*4 

22 Uou 

w M 5 
M IsjiS 

*• SS*i 
27 |*6*i 
M 579* 
W *9*» 

40 6041 

41 toxi 

4® tot! 

& <>m 
« 641 s 
41 <w- 

44 44 

47' %7*t 

4$ 6411 

44 6«0»l 


v>p ! 


075 

3U3 

5*63, 

P#4 

34 M 

J M64 

34*3 

5<>,0 

3*55 

3674 

38 JO, 

3*3*, 

}*S6 

3997[ 

40*4 

403* 

4166 

4»*i 

410 O 

43|0 

434«» 

4J6* 

44*7 

450 * 

45»® 

4959 

4654 

4 tof> 

47*8 

4*00 

4814 

49** 

4944 

4955 

5094 1 

|W9 

509 a 

$I9»* 

t*«» 

*144 

S3** 

S3*> 

5353 

5451 

54j5 

M7* 

5J75 

55*7 

5599 

S*94 

5705 

S?<7 

5*09 

0JI 

j«j/ 

»*» 

5933 

10,11 

JTH 

8oj* 

6041 

6051 

61 j* 

6149 

6ito 

6*45 

6i$Jf 

6s6| 

6j45 

6*ttKW 

6444 

<M54; 

6464 

6j4J 

*»*• 

6*61 

ton 


</v>* 

67 **; 

6719' 

6749 

i*u \ 

691* 

6*50 

*»*W 

to** 


_ UP* UiA 

rS«4 

_1*5*41*64^ 

"•Vf* '* ; 

*toj 

**33 

2175!;ic>* 

1405 .. ' 

|2430 *4?S: 

> 64 ^ ..'I ! 

. tfty* 

tljH 1 

__ 3$oo 

5 ^'j nS Tu*? 
33 ft 4 Ji*4 JJ4S 
jy>a mu >M»; 

Jto* J7tl 5719; 
JS74 1<9» 59°9 
4042 4065 
4116i4j.il 4-’4 «i 
4378 4391 4409 
4533 4*4* 4j6* 

4683 469** 47 n 
4819 4843 4857 
*969 49*3 4997 
$**$ 5**0 5134 
h*yr 5«S0 5J^> 
i|66 J37* 53V* 

$490 *> y & 55 U 
5<mi 5613 5635 

57*9 $740 575* 

5«4|h*55 $*to 
5Wj»66j«tf? 
6aftl’!6of$!M5. 
6170 

6m {61*4 6-94 

*474 1*4*4164 W 

‘ 6520 U 5 O1O 
6to$ 1667 i.fcAi: 
6; i9:4^67 j 6776 
2M i<*$r*6to 
6937; 6946^55; 


«« 

103 8 1 07 3 1106 

!J*T <J?S 14K. 

l*'?j usi <m 

I9J» «g*lT »M 

?.{?! »i» !!S 
*4*? S24 !£S 

a** ae as 
as* *® aft 

31 to HAI >M»i 
36*$ 33*5 34G4 
356c J579 359* 

3747 37*6 37*4 
39>7 IW5 396* 

4W 4*«6 4IJ1 
WS ♦*#* 4*9* 

4415 4449 44I* 

4579 4594 4tof 
47** 474* 4717 
4*7* 4*** 49«> 

50 N 5014 S®3* 

5*45 5*59 5171 
5176 52*9 53*>* 

54^3 54»* 54**1 
55*7 3519 lf$t 
047 365* 5870 

s 

59»*15999 9010 
to/.' GlOJ 6117 

6&»»;6*»ji4*t» [i t j[ 4 5 6] 7 

904 6 1*4‘6**5 

*4«*5(-*4*5i*4*$ 

^°J!****W.*-1** I! 4 | 7 * 

8599 * 6609 ? 961* 

W91;870l|67U 


5 * 7 ; 9 »9 
4 S 7 * 9 to 


67*5 6*94 ! 62oj 
6»?*i6«»4 
6904 ; 497 * 








































X An IHTKODUCTION TO STATISTICAL METHODS 





M 

it |7076 
6* brio 

il 


*• j7709 

W]?7** 

« S>«53 

*» 79H 

•• m% 

M 806.' 

M 81 >9 

S 28 

: :$ 

n\u$i 

2 few 

7* S1K573 

1 86.13 
86»a 

8751 

8808 

ms 

8911 

8976 




6998! 7007 [ 7**6 170*4 i 7 m 704* 

7084 7091 7*0»I7110 7118 7*36 
7i08)7i77(7l85l7»93^*o* 7*io 
7*75]7*84 7*9*! 

735* "3*4 ?37* 

7435 7443 743* 

75*3 75*o 75** 

7589 7597 7004 

76*4 707* 7659 |7086j 76941770J 
7738 7745 775* 

7810 7818 7#K 
78S2 7I89 7890 
795* 7W 7-706 
tk»i foaS 803s 
M 9 8096 hio* 

8156 »tta 8169 
831* 8*38 $335 
8187 8393 8499 
*35* 8357 *1*3 
8414 84*0 8426 

0476 8482 $488 
*537 8543 8549 
*S97 SCoj 8609 
80S? 8663 8009 
870418710 8716 8733 >727 

876* 8768 8774 8779 87*51*794 1 i707 
M*oi88i$ 88u 8837 8841 8848IMU4 
8876 888 x KV87 8894 889916904[89:0 
893a *93* &94I *949 *954 
89*7 *993 *99* 9004 9009 

9053 905* 9063 
9106 9112 9**7 
9*59 9*6$ 9*7C 
9213 92*7 9322 
9263 9269 9*74 

93*5193™ 93*5 
93<»5 j9370 9175 
9415 9420 9435 
94*5 >t#9 9474 

95* »j932i 9**1 

9564)9^6 4571 

>x» 9614 
06571966* 9606 
*>703 0T« 8 97*3 
9750*9754 9759 
9795 ?Soo 9S05 
9&u pA»5 <A$o 
«A8* «v> 9594 
■Wiw* 9919 
*9fttliP97**9W 



55 

*« 2 3 3 4 5 56 

1 1 a 334 456 

* * > 234 45* 

If 2 134 4Si 

I \ a 234 I 4 SS 

II 2 234.455 

* * 2 J ,U 45 5 
tia 334 45 5 

* * a 2 3 3 4 5 5 

**2 *33 45 5 

* * t 233*445 

* * 2 2 13 4 4 5 

* * * 233 445 

*» 2 23 3 44 5 
' * *<*3i 44 5 

I I 2 23344 £ 

* * 2 23144 5 
*•2233 44 5 
*»2 233445 


WS9 W1 


01 

1 

a 2 3 

34 4 

0 » 

1 

223 

3 4 4 

0 1 

t 

223 

34 4 

on 

323 

344 

94 

1 

2 2 3 

,344 

01 


223 

34 4 

01 


2*3 

344 

01 


[**3 


01 

* 

1 * * 3 

344 

0 l 


,2*3 

54 4 

01 


**3 144 

TJj 

3n 

izziEm 

t* * 3 

;jj4 








































































































LOGARITHMS 

antilogarithms 


0! \ 10*3 


03 '■ fo; a 

4M i t o$6 
OS 112* 
'QG ji I 14 # 


'vO Hjm 
** | *a$© 
10 2 1*50 

11 Vll** 




■J7 j;M79 
'»*•?! 5U 
■1* J> 549 
20 i 1 5*5 


1009 ! 1012 ; IOJ 4 


toiyilCK? -!r*.2 
jolt tolii10&6 
1107; liOOjlUl 
:i3*lit35 113* 
151^115311156 mojnfc? 1164 
i 7 #!ii*d 11*3 u86!ufo> 1191 
2051110* iii-i t?i,v|iai6j M19 
*ll{i*J* t *39 t* 4 *j **45 IJ 4 T 

tfjl lff ?4 147 * 
r*x» :v*3 *306 
03° *334 UJI 
1361 1365 ii <4 
« 193 |*M 6 j« 4 C 0 
U»6i UW{ 14 1* 

ujo! 1463114** 

J49H 1406}1500 

1563)1567 .**570 

1660 j 1603 ; ite >7 

5057 I »< 4I ; 1644 
*673} i*» ;1«J 

t 7 U;« 7 ‘# 17a* 

I 754 i» 75 * * 76 * 
17 * 6 ? * 70 * 1 *795 5799 1*03 
1*37 »* 4 f 1*45 
1*70 k **4 I*** 
im »»v* 193 * 
196U 197a 1977 
JO14 am* 2023 
to6i *<165 2070 
an* 

a 15* 216 3 2168 

230 * 3313 221 * 

a *$9 2265 *370 
231213317 2333; 
236O 3371 2177! 

* 4 * 1**417 *43* i 
* 477 ! * 4*31 * 4 * 9 ! 



*535 

2S94 f 

•*4*1*64911*5$; 


37*6 >n*3 *7*9 
.2780 27 * 6 . 2793 

39**. Wl\***4 
*979 *<Af*i*** 
3*4* 306* 

jtty] 3126,3133 


2341 (*3471 
*4m j 2 <i«6 ' 
2661 1,3669 : 


1016] foiol 10*1 

104011042 104$ 

10641' 1067 1069; 

H^yjioyi 1094! 
im mi 1119 

II40; 1143 ( 114 *] 

116?; n6y nfij 
1194)119711199 
ia?aj 1235 j 1227 ! 

If^]uj3|i*56| 

J 279 i 1 M*5 tils J 

>W 13** I >3*5 1 

* 540 *34 J I 1346 
* 37 * 1374 U 77 
1403.1406 £409 

I 435 j 1439 * 44 *' 

1469 * 473 } U 76 : 

*5 Vi 1 507«*$*» 

T 54S 

«T 4 * 57 * 1 * 56 * 

16111614 1618j 
164* .63* 1656 1 
* 61 ; 1690 1694! 

1726 1730 1734 
1766 1770 *174 
1*07 tilt 1*16 
**54 1*5* i 
1892ily? »9o»! 

*9|6 ’941 1945 

■ <962 *9*6 199 * ■ 

202* 7031 20 \71 
207$ ac 4 k> 90*4 ! 

2*33 in# luji 
*171 217 * 1 * 1 * 3 . 

2223 2K* 12234 

2174 22*0 ,2.2*61f I 9 JIJ3*445 
* 3 » * 3 J| * 13 * 

23*2 21** 2393 
243 * *441 *449 
2445 * y * * 5 «* 

* 53 J »S$f* 1*4 
26*2 46 li 262411 1 * 1*34 
* 6?1 ** 7 *it«*f 


2673267926*1 
*715 * 14*1 W 
*7991*1*$ till 

2 * 641 * 7 * Mil 
*932 # 3 * J944IM *il*4 
W* ]** JWJ 
**» «« J**J 
J*«> j*4*jj*Ji 


»»• JJU 


mr 





































































sfti&s iinim ms=8 zszim imis tzutg 


XU AM IHTmODUCTIOW TO STATISTICAL METHOD* 

ANTILOGARITHMS 






































































































1.0CAMTHM8 XIII 

RECIPROCALS 

{Numbttt m Mm D$trtm Ctlmiu to h miirmttJ} 




































































AN INTAODUCTION TO STATISTICAL METHODS 

RECIPROCALS 

{Numbtt in Moan Dijftuna Columns to bt subtracttd) 



Mtu Mtrwtw 


* 5 P 1534 MW* 


nn \ itu\uyi 


n% rill 
1175 “71 


mo uoo nor 


lftp$) !%*] 1799 . 
f 77 .i t I 77 <»[ W 
* 74 *“ 7 j 9 * 7 P 
17*351? o9> i;oo 
16 % ! **lij »07# 

1659 i*>5jh6$o 
*639 rO/j 
160* 16001159* 
*577 * 57 ) > 57 * 
>553 ' 55 « ' 54 * 

• 5*9 > 5»7 «S *4 

T 5 «J 9 15 <M * 505 j 

14»4 * 4 ®» *479 
f pi 1460 145H 
1441 1439 MJ 7 
1.4*0 14(8 14*6 
*401 *399 *397 

' )*' > 379 1377 

>jf*» >)'" >359 

• Ml 1 M2 J 34 “ 

• 3 *& 'MS > 3*3 
1309 130? 1305 

II «>3 1290 12% 

(276 *274 “7* 
1259 125I 1256 

1*44 U42 124* 
1119 U 2 ? 1225 
1214 im lift 
1199 “ 9 « 1*96 
HI5 *183 III* 

1171 1*70 U68 

**57 “ 5 & “ 55 
1144 1143 “4* 
“J* “JO “*9 
1119 my ttt6 

not* iko$ 1104 
1094 liMmicgt 
toll JOJU 10% 
1071 1070 106I 
1059 *OjS iq |7 
1046 1047!*046 
*<>37 tojbjiojs 
1047 ! #026'1015 
toto r 101521014 
1006;100511004 


* 79 $j * 79 * |1 
170411764 {J 

*733(»73o| 

1647;164$! 

1031 \IUlH 

'5951*592 
1570 1507 
‘546 1543 
1522 I$20 
1499 >497 
*477 *475 
*456 *453 
>435|*413 
*4*4 *4“ 
*395 *393 
1376 *374 
*357 *355 
*339 *337 

*32* *3*9 
1304 1302 
1287 

1271 1269 

**$S *253 

“39 “3# 

1124 “22 
1209 *20* 

“9$ “93 
lilt si79 

n*7 u66 
1153 1152 
* *40] 1139 

li*7?if26 
111511 f (4 

mjlnoi 

1090 ! 

1079'toy* 
1007! 1006 
1056 j 10 $$ 

104$1*044 
**3411051 
io24 f toaa’ 
ioi J iOUv 

1003 1 ttm j 












































APPENDIX If 


SELECTED REFERENCES 

Allen, R. G. D., Statistic* for Economists. (Hutchinson) 
Amlin, H. & Colton, R. R., Statistical Methods. (Barnes & 
Noble, New York) 

Boddington, A. L., Statistics and Their Application to Com* 
merer. (H. F L , London) 

Bowu v, A. L., Elements of Statistics. (P. S. King k Staples, 
London) 

Chambers, E. G., Statistical Calculations. (Cambridge Univer¬ 
sity Press, London) 

Crum, Patton & Tkbbutt, Economic Statistics. (McGraw-Hill) 
Croxton £ Cowdon, Applied General Statistics. (Prentice 
Hall) 

Edwards, A, L., Statistical Analysis for Students in Psychology 
and Education, (Rinehart k Co., Boston) 

FiSHfeft, Irving, The Making of Index Number*. (Houghton 
Mifflin Co , Boston) 

Ghosh k Elhancb, Indian Statistics. (Indian Press) 

Hitt, A. Bradford, Principles of Medical Statistics. (The 
Lancet Ltd., London) 

Jones, D. C., A First Course in Statistics. (G. Bell & Sons, 
London) 

King, W.L, Elements of Statistical Method. (Macmillan k Co,) 
Lewis, E. E. Methods of Statistical Analysis in Economics and 
Business. (Houghton Mifflin Co.. Boston) 

Lindquist, E. F., A First Course in Statistics. (Houghton Mifflin 
Go., Boston) 

Mills, F. C., Statistical Methods. (Henry Holt, New York) 
Moot, £. 8*. The Elements of Statistics. (Prentice Hall) 



xn an wmmvcrmtt to $fAn$rtcAv methods 

Mounsey, J., Introduction to Statistical Calculations. (English 
Universities Press, London) 

Mdkonsy, M-J*. Facts from Figures. (Penguin Boob) 
Nexswawor*, W. A. # Elementary Statistical Methods. (Mao 
roiflan, New York) 

PaOEN & Lindquist, Statistics for Economics and Business, 
(McGraw-Hill, New York) 

Rao, V. K. R« V., National Income of British India, 1931-32, 
(Macmillan, London) 

Richardson, C. H., An Introduction to Statistical Analysis, 
(Harcourt, Brace and Company, New York) 

RKiOlEMAN & Fatsaee, Business Statistics. (McGraw-Hill) 
Rt»oa, R,, Elements of Business Statistics. (Appleton, London) 
Secftltr, H., An Introduction to Statistical Methods. (Mac¬ 
millan, New York) 

SMITH & Duncan, Elementary Statistics and Their Application. 
(McGraw-Hill) 

TlPwnrr, L. H. C, Statistics. (Oxford University Press) 

Walker, H. M, t Elementary Statistical Methods. (Henry' Holt, 
New York) 

WaUOH, A. E„ Elements of Statistical Methods, (McGraw-Hill, 
New York) 

Yule ft Kendall, An Introduction to the Theory of Statistics. 
(Charles Griffin ft Company, London) 



APPENDIX HI 

INDEX 


A 

A bscissae, 114 

Agricultural Statisttes, 669-685 
Addition Theorem, 297 
Aft Specific Death Rate, 591 
Age Specific Fertility Rate, 597 
Approximation. 34 
Array of data, 62-47 
Arithmetic Mean. 131-164 
Computation, 151-162 
Nature and Significance, 163-164 
Properties. 162-163 
Association of Attributes, Ch. XIX 
551-566 

Association and Dissociation, 559 
Coefficient of, 554 
Consistency of Data, 557 
Dichotomy and Natation, 551 
Method of Proport ton, 562 
Positive and Negative Associa¬ 
tion. 562 

Positive and Negative Attributes. 

552 

Asymmetrical Distribution, 253 
i-See. Skewness} 

Averages. 150 

1 

Bar Diagrams, 85 
Bilateral Bar. 93 


G 

Census Enumeration. 47-49 
Census of Manufacturing Industries, 
692 

i Central Tendency* Measures of, 
i Ch. X 149-205 

\ Selecting the Average, 193 1% 

( Type of, 150 

Arithmetic Mean, 151-164 
i Geometric Mean, 108-191 

Harmonic Mean, 191-193 
Median, 164-172 
! Mode, 177-188 

Quaitdes, Deciles and Percentiles, 
173 

» Weighted Mean, 199-205 

Charting . Categorical Scries, 84 
Bar Diagrams, 85-95 
Circles, 102-104 
j Frequency Series, 130-146 

! Maps and Pictures, 104-107 

i Rec tangles, 96-99 

Squares. 99-102 
Time-Senes. 115 
Chi-Square test, 391 
Uses of, 395 

O allocation and Tabulation. 

Ch VII. 56*78 
f Captions and Stubs, 74 

j Class-interval, 68*71 


Percentage Bar, 91 
Simple Bar* $7 
Split Bar, 94 
Subdivided Bar, 87 
Biased Error, 52 

Binomial Expansion Method, 580 
Binomial. Normal and Poisson Dis- 
tribtittons, Ch XIV, 307 
Binomial Distribution. 307 
General form of, 311 
Mean and Standard Deviation of, 
313 

Normal Distribution, 316 
Normal Curve, 317 
Properties of Normal Curve, 321 
Standard Normal Form. 330 
Poisson Distribution, 335 


CaWuktion of Arithmetic Mean, 


Fitting, 337 

Properties of, 337 
DtHity of, 336 


I 


Condition Senes, 60 
Continuous Series, 59-71 
Continuous Variable, 69 
Coefficient of Variation, 241 
Collection of Data. One. IV and V, 
24-35 ; 36-45 

Combinational, 292 
C'ompcmatory Errors, 53 
Comtrmer Price Index Numbers, 
771-784 

Function* of. 771 
I imitatiom of, 782 
Methods of Compilation* 776 
New Series of Consumer Price 
Index* 782 

Problems in the Construction* 772 
Corrected Death Rate, 202 
Correlation and Regression, 

Of XVIII, 501 

Correlation, 501 

Concurrent Deviation Method* 

■ ■ ■ m 




xvttt An introduction to statistical methods 


Causation and Correlation, 503 
Graphic Method, 522 
Karl Pearson’* Coefficient of, 509 
Positive end Negative. 501 
Rank Correlation Coefficient, $26 
Scaiter Diagram, 504 
Regression line, 529 
Crude Death Rate, 202; 590 
Cumulative Errors, 52 
Cumulative Frequency Cum* 138 
Cumulative Percentage Curve, 142 
Cyclical Fluctuation*, 450 
Cyclical Variat ions. 411 

D 

Decile*, 173-177 

Location of, 174*J77 
Degree of Accuracy, 33 
Degree of Freedom, 393 
Deliberate Selection, 49 
Determination of the Unit, 26 
Classification of, 28 
Importance of, 27 

Diagrammatic Representation, 
Ch. V1H 82*107 
"Difference’', Test, 378 
Discrete Scries, 59 

Dispersion, Measures of. Ch. XI 
215*247 

Definition of, 217 

Object of, 216 
Types of, 216 

Coefficient of Variation, 241 
Mean Deviation, 226 
Ranae, 218 

Semi-inter-quartile Range, 222 
Standard Deviation* 232 

E 

Editing: 

Primary Data* 43 
Secondary Data, 44 
Elimination of Trend, 440 
Enumeration i 
Census, 47 
Sample, 47 

Errors: 

Biased Errors, 52 
Compensatory Errors, 53 
Cumulative Errors, 52 
Sampling Errors, 351 
Standard Error, 354 
Standard Error of Estimate, 532 
Unbiased Errors, 53 
Eftpiaifted and Unexplained Vtmbi- 
lily, 535 


I 

F-test, 388 

Factor Reversal Test, 478 
False Base Line, 117 
Free Hand Method, 416 
Frequency Distribution, 69,144, 252 
Asymmetrical Distribution, 253 
J-shaped Distribution, 253 
Normal Distribution, 252 
U shaped Distribution, 253 
Frequency Polygon, 134 

G 

General Death Rate, 2D2 
Geometric Mean, 188-191 
Characteristics of, 190-191 
Uses of, 191 

Geometric Straight Line Trend, 433 
Graphic Presentation, Ch IX, 
M 4-146 

Gross Production Rate, 598 

H 

Harmonic Mean, 192-193 
Histogram, 132 
Historigram. 132 

I 

Irregular Fluct jaiions, 414 
Index Numbers, Ch. XVM, 454*492 
Base Shifting of, 488 
Comparison of, 465 
Construction of, 455 
Deflating, 491 
Factor Reversal Test. 478 
Fixed and Chain Base Indices, 

Ideal Index, 479 

Problems in Index Number Cons* 
t ructions, 481*484 
Splicing two Index Number Series, 
490 

Time Reversal Teat, 463 
Index Numbers (in India, 751-784 
Index Number of Agricultural 
Production, 751-757 
Index Number of Commodity 
Prices, 757-771 
Official Index Number of Whole* 
sale Prices. 759 
Index Number of Industrial Pro¬ 
duction, 695*704 
Infant Mortality Rate* 591 
Indian Statistics, Ch, XXW, 622*784 
Agricultural Statistics, 669*685 



INDEX 


Consumer Price Index Numbers, 
771-784 

Fortst Stathtiict. 687-689 
Historical Background, 622-625 
Industrial Statistics, 689-704 
Labour Statistics, 661 -669 
Absenteeism, 664 
Employment, 662 
Industrial Disputes, 667 
Industrial Injuries, 666 
Labour Bureau Index, 668 
Labour Turnover, 665 
Trade Union* 665 
Unemployment, 664 
Wages, 667 

Livestock Statistics, 685-687 
National Income, 724-741 
National Sample Survey, 717-724 
Nature and Structure of the Indian 
Statistical Organisation, 625-639 
Cabinet Secretariat, 62$ 
Statistical Organisation at the 
Centre, 627 
Statistical Organisation in the 
States, 627 

Population Statistics, 639-661 
Social Accounting, 741-751 
Trade Statistics, 711*717 
Foreign Trade, 711 
Internal Trade, 715 

1 


J-Curve, 253 


K 

Karl Pearson's Coefficient of Cor¬ 
relation, 508 

Kurtom, 216-217; 271-273 

L 

Labour Statistics, 661-669 

Lagrange's Method. 583 

Large Samples, 355 

Law of Inertia.of Large Numbers, 47 

Law of Statistical Regularity. 46 

Least Squares Method, 422 

Life Tablet, 601 

Link Relative Method, 446 

Livestock Statistics, 685-687 

Logarithmic Straight Line, 434 

Logarithms See Appendix t 

Loren* Curve, ■245-}47 

M 

Mmu Deviation. 226 


XIX 


Characteristics of, 231 
Measurement of Trend, 416 
Measures of Central Tendency, 
a. X 149*205 

Median, 164-172 

Properties of, 171-172 
Nature and Significance of, 172 
Measures of Dispersion, Ch. XL 2(5 
Mensural tonal Units, 30 
Methodology of Statistics, 3 
Mid-Point, 70 
Mode, 177*188 
Definition of, 177 
Location of. I8M87 
Properties of. 187-188 
Moments, 263*27] 

Moving Average Method, 416; 448 
Centering, 420 
Mortality Measures. 590-592 
Multiplication Tlfcorem, 299 
Multi-stage Sampling, 5] 

N 


National Income, 724-741 

Difficulties of Estimation, 728 
Methods followed for Extmwtkm. 

730, 740 

Official lisiimaics by the N. 1 U. 

734 

National Sample Survey, 717-724 
Nett Reproduction Rate, 600 
Non*linear Trends, 433 
Normal Curve, 316-335 
Advantages of, 334 
Discovery of, 3 J 7 
Fitting a Normal Curve, 323 
Mathematical Curve. 319 
Properties of, 321 
Normal Distribution. 252, 316 
Testing the Normality, 33J 
Normal Lena non. 424 
Null Hypothesis, 353 

O 


Ogive Curve, 138 
Ordinates. M4 


r 

Pecuniary Value Units, 30 
Peretimte.*, 173*175 
Permutation*. 288 
Poisson Distribution. 33? 

Fitting a Poisson Distribution, 

337 

Utility eC m 
Pirns of Procedure. 25 



m AN INTRODUCTION TO STATISTICAL METHODS 


■Papule! ton Statistics, 639*66) 
Preliminary Conaidemrom, 24 
Primary Data, 36 
Prbhary Methods, 36-42 

jrobMutity, cit xiir, 2 * 3 - 30 $ 

Protest Control, 608 
Product Control, 619 
Proof for Poisson Distribution as a 
limiting Case of Binomial Di& 
trlbution, 344 


Q 

Quality Control, Ch, XXII. 607 
Control Chart*, 6 l 0 - 6 h 
Process Control 606 
Product Control 619 
Quart ties, 175.177 
Location of. 174-177 
Quart!!® Measure of Skewness, 260 
Quasi-Random Sampling, 50 

R 

Random Fluctuations* 414 
Random Selection, 49 
Range, 21® 

Rant Correlation Coefficient, 526 
Ratio Scale, 123 
Construction. 126 
Cycles on, 12 L » . 

Relative Standard Deviation, 24! 
Regression Cocfflctent and Coeffi¬ 
cient of Correlation, 537 
Regression Line, 329 
Regression of X on V, 536 * 
s 
S 

Sampling, 46, 350 
Sample Enumeration, 47-49 
Sampling Errors, 331 

Sampling Method, 49*51 

Deliberate Selection. 49 
Multistage Sampling. 51 
Quasi-Random Sampling, 50 
Random Selection, 49 
Stratified Sampling, 30 
Systematic Sampling, 50 
Scales * 

Arithmetic Scale* 125-126 
Ratio Scale, 126-130 
Scatter Diagram, S04 
Scientific Methods: 

Logical Methods. I 
technical Methods, 1 
Scope of Inquiry, 25 
Seasonal Fluctuations, 441 
Seasonal Variations* 412 



Second Degree Curve, 435 
Secondary Data, 36 
Secondary Method*, 42-4? 

Secular Trend 409 
Selecting the Average, 193-196 
Selection of Claeses, 67 
Semioversge Method, 4)5 
Semi-interauartilc Range, 222 
Semi-logarithmic Scale, 123 
Semi-logarithmic Trend, 433 
Sheppard 4 * Correction, 273 
Skewness. 216; 252-262 ; 270 
Small Samples. 372 
Smooth Frequency Curve, 136 
Social Accounting, 741-751 
Uses of Social Accounts, 750 
Spatial Series, 60 
Splicing of Index Numbers. 490 
Standard Deviation, 232 
Properties of, 240 
Standardised Death Rate, 202, 593 
Standard Error, 354 
Standard Error of the Mean, 356*357 
Standard Error of Estimate, 532 
Interpolation of, 534 
Statistical Analysis, ti 
Statistical Error, 51-53 
Statistical Data. 4 
Statistical Inference, 8 , 357 
Statistical Inquiries, 22-2* 

Statistical Methods, 3, 7 
Statistical Series, 59 
Statistics, 2 
Definition of, 3 
Development of, 9 
Distrust of, 18 
Functions of, )! 

Importance of. 13 
UmitatioRs of, 17 
Methodology of, 3 
Origin of, B 
Theory of, J 
Stubs, 74 


‘V distribution, 373 
Uses of. 375 
Tables, 76 

Original and Derivative. 76 
Simple and Cample*, 76 
Tabulation. 72*78 
Complex Tables, 76 
Derivative Tables, 76 
General Purpose Tables, 75 
Origmai Tables, 76 
Simple Tables, 76 
Special Purpose Tables, 75 
Tech nique of DM* Cobectkm, 53 



mm% %xi 


Time Reversal Test, 465 
TimrSerics, m r US, m 

Analyst* of; Os. XVf, 404450 
Computing the Trend, 425 
Cycttcftl Variations, 411 
I .east Squares. 422 
Measurement of Trend, 416 
Nonlinear Trend* (Second 

Degree Curve;, 435 
Periodic Changes, 411 
Seasonal Variation, 412 
Method* of Measuring, 44J 450 
Secular Trend, 409 
Third Degree, 439 
Trade Statistics, 711-717 

C 

Unbiased Errors, 52 
Unit, 26*33 

Oasstfkaiion of, 28-31 
DeicrmmaMOji of, 26 
Importance of, 27 
Original ion of, 28 

V 


Vital Statistics, Ch. XXI. 5SM04 
Methods of Analysis of Vital 
Bmm* 590 

Measures of fertility, 596 
Measures of Mortality , 590 
Standard Death Rate. 593 
Registration of Vital Facts, 519 

W 

Weighted Mean, 199-205 
Weighting of lodes Number*. 469 
Weighted Aggregate of Price 
Index, 473 

Weighted Geometric Mean of 
Relative*, 477 
Weighted Mean of Relatives 
Price Index. 475 

V 

Yates' Correction, 394 

% 

Z-test for Tcstmt Significance of 'f 


Variance Ratio Test, 388 





